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' Abstract 

The estimation of parameters in the frequency spectrum of a seasonally persistent sta- 
tionary stochastic process is addressed. For seasonal persistence associated with a pole 
. in the spectrum located away from frequency zero, a new Whittle-type likelihood is de- 

' veloped that explicitly acknowledges the location of the pole. This Whittle likelihood 

is a large sample approximation to the distribution of the periodogram over a chosen 
grid of frequencies, and constitutes an approximation to the time-domain likelihood of 
the data, via the linear transformation of an inverse discrete Fourier transform combined 
with a demodulation. The new likelihood is straightforward to compute, and as will be 
demonstrated has good, yet non-standard, properties. The asymptotic behaviour of the 
' proposed likelihood estimators is studied; in particular, A'^-consistency of the estimator of 

the spectral pole location is established. Large finite sample and asymptotic distributions 
of the score and observed Fisher information are given, and the corresponding distribu- 
tions of the maximum likelihood estimators are deduced. Asymptotically, the estimator 
of the pole after suitable standardization follows a Cauchy distribution, and for moderate 
sample sizes, we can use the finite large sample approximation to the distribution of the 
estimator of the pole corresponding to the ratio of two Gaussian random variables, with 
sample size dependent means and variances. A study of the small sample properties of the 
likelihood approximation is provided, and its superior performance to previously suggested 
methods is shown, as well as agreement with the developed distributional approximations. 
Inspired by the developments for full likelihood based estimation procedures, usage of pro- 
file likelihood and other likelihood based procedures are also discussed. Semi-parametric 
estimation methods, such as the Geweke-Porter-Hudak estimator of the long memory pa- 
rameter, inspired by the developed parametric theory are introduced. 

KEYWORDS: Periodogram; Seasonal persistence; likelihood inference, Whittle likeli- 
hood. 
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1 Introduction 



In this paper, we develop likelihood estimation of the parameters of a stationary stochastic 
process that exhibits seasonal persistence, that is, long memory behaviour associated with a 
stationary, quasi-seasonal dependence structure. We introduce a new frequency-domain like- 
lihood approximation which is computed using demodulation and which, for the first time, 
facilitates maximum likelihood estimation. We consider joint estimation of the seasonality 
and persistence parameters, and establish the asymptotic and large sample properties of the 
likelihood and its associated maximum likelihood estimators. This is in direct contrast with 
previously suggested procedures, where the distribution of the estimator of the seasonality 
parameter could not be established (Giraitis et al., 2001). The estimators are demonstrated 
to have good small sample properties compared with estimators based on the classic Whittle 
likelihood, and other non-likelihood derived estimators. Our non-standard asymptotic re- 
sults rely on the appropriate renormalization of the score and Fisher information, and utilize 
a parameter-dependent linear transformation of the data. This transformation enables an 
efficient approximation to the likelihood. The transformation also introduces a number of 
interesting and non-regular features into the likelihood surface: jumps, local oscillations, and 
non-regular large sample theory. Despite these issues the large sample theory can be deter- 
mined, and appropriate finite large sample approximations provided, as will be demonstrated. 
It transpires that the small sample properties of the estimators are competitive with existing 
methods, as well be discussed in later sections. 

The contributions of this paper thus include new theory for non-regular maximum like- 
lihood problems. In similarly motivated work, Cheng and Taylor (1995) discussed problems 
associated with maximum likelihood estimation for unbounded likelihoods: in contrast we dis- 
cuss problems associated with distributions of non-identically distributed, weakly dependent 
variables with highly compressed and for increasing sample sizes unbounded variances. Given 
the importance of compressed linear decompositions in modern statistical theory, our work 
has implications for the distribution of sparseness-inducing transformations much beyond the 
analysis of seasonal processes and Fourier theory, and forms a contribution to developing 
methodology for inference of stochastically compressible processes. 

One of the concrete and substantive conclusions of our new estimation procedures is illus- 
trated in Figure 1; this figure illustrates that whereas a standard estimation procedure, based 

on the Whittle likelihood (sec Section 1.3), produces estimates that are, on average, biased 
even in large samples, our new procedure, based on a carefully constructed likelihood (see 
Sections 2.2 and 3), produces estimators that exhibit no such bias. Full details of this Figure 
are given in Section 4.1. 

1.1 Seeisonally Persistent Processes 

Stationary time-series models with long range dependence describe a wide range of physical 
phenomena; see for general discussion Andcl (1986) and Gray et al. (1989), and also appli- 
cations in econometrics (Porter-Hudak, 1990; Gil-Alana, 2002), biology (Bcran, 1994) and 
hydrology (Ooms, 2001). Dependence in a stationary time series is parameterized via the au- 
tocovariance sequence, {7^}. We are concerned with the estimation of parameters that specify 
7^ under an assumption of seasonal persistence. Specifically, of particular importance is the 
seasonality of the data characterized by a frequency, ^, termed the pole, and an associated 



2 



Demodulated 



Demodulated 





Figure 1: Simulated Data: Mean standardized likelihoods for the pole (right) and the long 
memory parameter (left) over 2000 simulations, with sample size of 1024, and the true values 
of the long memory parameter and the pole taking the values 0.45 and 1/7, respectively. The 
vertical solid lines indicate the true values of the parameters. The Demodulated likelihood 
is noted in equation (16) whilst the discrete Whittle likelihood is noted in equation (8). On 
average, the demodulated likelihood has its mode at the true values, whereas the Whittle 
likelihood does not. See Section 4.1 for full details. 
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degree of dependence, characterized by a persistence (or long memory) parameter 5. Whereas 
inference for the persistence parameter in the context of poles at frequency zero has been much 
studied (Beran, 1994), the theoretical behaviour of estimators of the persistence parameter 
remains largely uninvestigated when the underlying seasonality of the process is unknown. 

Let {Xf} be a zero- mean, second-order stationary time series with autocovariance (acv) 
sequence 7,- = coy {Xt, Xt+r} = ^{XtXt+r}, and spectral density function (sdf), /(•), 

00 

/(A)= J2 Ire-^'""^^. (1) 

r=— 00 

The process {Xt\ exhibits seasonal or periodic persistence if there exist real numbers H G 
(1/2, 1) and ^ G (0, 1/2), and a bounded function 0(7) such that 

I ^M2g-2 = COS (27rer) , 

r-»oo c(^7j |r| 

or equivalently if there exist (3 G (0, 1) and ^ G (O, \) and a bounded function c(A) such that 

c(A) 

Following convention, we parameterize the persistence parameter via 5 = (3/2. In line with 
this definition, a process is considered to be a seasonally persistent process (SPP) if, in a 
neighbourhood of ^, 

/(A) = /t(A)|A-e|-^ + o(|A-e|A (2) 
where /^(A) = c(A) > 0, < A < ^ is bounded above. 

Parameters (^, 5) determine the dominant long term behaviour of the process; typically, 
^ corresponds to the location of an unbounded but integrablc singularity in the sdf. In this 
paper we consider a parametric family of sdfs consistent with (2), that is, the parametric 
model of Giraitis et al. (2001), where 

/(A) = /g(A; e, al) = al\h{X ; 6)^1 - 26"^-^ cos(27rO + e-^-^)-^^ (3) 

where /i(A ; G) is bounded above and below at A = ^, with some linear process assumptions, 
given for instance in Hannan (1973); for example, h{-) could be the sdf for a stationary and 
invertible ARMA process, such is the case for GARMA processes, see Gray et al. (1989). We 
consider behaviour near the pole in such models by defining /^(A), where 

/ (A) = /t(A) |A - Cr'' = /^A; 6, 0) |A - ^\-'' . (4) 

The results in this paper will also be applicable to nearly non-stationary unit root AR pro- 
cesses, when the roots of the AR process approach unity at a suitable rate in the sample size, 
this quantifying issues with near unit root processes. 



1.2 Estimation for Seeisonally Persistent Processes 



We consider maximum likelihood estimation of ^ and S, and denote the true values of these 
parameters by {^*,6*). Joint estimation of the seasonality and persistence parameters is 
of importance, as inaccurate estimation of ^ will affect the estimation of S, and any other 
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parameters of the sdf - 6 quantifies the rate of decay of the dependence, and thus determines 
the long-term behaviour of the series. Note also that, even in cases where ^ is believed to 
be known (for calendar data, equal to 1/12, or 1/7, or 1/4 say), there may on occasion 
be finite sample advantage in estimating ^ rather than using its known value, in terms of 
estimation of the other parameters of the system. For example, if ^ is regarded as a nuisance 
parameter, then S may be more efficiently estimated after conditioning on ^ rather than 
see, for example, Robins et al. (1994) and Rathouz et al. (2002) for supporting theory. This 
issue goes beyond the scope of this paper, but gives further indication that estimation of ^ is 
intrinsically important. 

We will examine inference for the parameters of an SPP based on a realization of the process 
of length N. Throughout this paper, for convenience and with minimal loss of generality, we 
will assume is even, N = 2M say. We establish asymptotic results for these estimators 
(^,(5), and provide practically useful large sample approximations to the distribution of the 
estimators. In particular, we define a large sample approximation to the log-likelihood of 
the periodogram evaluated at a full set of frequencies spaced 0{N'~^) apart. At a local 
scale the variational structure of the log-likelihood in ^ remains appreciable over 0{N~^) 
distances; however the magnitude of these variations becomes negligible compared to the total 
accumulated magnitude of the log-likelihood for increasing sample sizes. We demonstrate that 
this variation prevents standard likelihood results being valid for the estimator of ^, although 
standard asymptotic results can be established for the estimator of the S, which is in agreement 
with previous results, see (Hidalgo and Soulier, 2004; Giraitis et al., 2001). We discuss in 
detail the large sample behaviour of N{£, — and establish its approximate large sample 
distribution, as well as a moderate sample size approximation. Finally we demonstrate that our 
likelihood-based estimators have good small sample properties on simulated series compared 
with other, non-likelihood estimators, and consider estimation of the system parameters in a 
econometric example, using a data set with weekly gasoline sales in the United States, and 
two meteorological examples, monthly temperature data from a Californian shore-station, and 
the Southern Oscillation Index data set. 

1.3 The Periodogram, Likelihoods and Approximations 

We consider a sample from a stationary Gaussian time series, X = {Xq,Xi, . . . jXtv-i)"*", as 
defined in section 1.1, with covariance matrix Qn = 0]\r(^,S,0,a'^) with (i,j)*'^ element 7|i_j|- 
The exact log likelihood, i^, of the finite time-domain sample is given by 

{^,S,e,a'',) =2logLN {^,S,e,a'',) = -Nlog{27r) - \og\gN\- X^g^^X. (5) 

This likelihood is often approximated due to the computational complexity associated with the 
calculation of Gj^^- The standard approximation approach was introduced by Whittle (1951), 
and the resulting, much studied, discretized approximate likelihood is commonly known as the 
discrete Whittle likelihood. The Whittle likelihood gives an approximation to the likelihood of 
the time domain data in the frequency domain via the Fourier coefficients, under assumptions 
as specified by Beran (1994, p. 109-113, and 116-7). Problems associated with the usage 
of Whittle's approximation for non-Gaussian and small sample size Gaussian time series has 
been discussed by Contreras-Cristan et al. (2006). 

The final two terms in equation (5) are approximated using results of Whittle (1951) and 
Grenander and Szego (1984). It follows that the Whittle likelihood for and 6 is given 
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by: 

i^r{u,oy,)=~j^/j§dx, (6) 

where Iq (A) is the periodogram, defined as the modulus square of the discrete Fourier transform 
(DFT), Zq (A), of the reahzed time series. 

At the Fourier frequencies (pj = j/N, j = 0, . . . , M, the periodogram, Iq, is given by, 
loiVj) = where 

N-l 

Zoi^pj) = 7^ E = Aoi^j) - iBoi^pj), j = 0,...,M, (7) 

so that 

'N-l N-l t-l 

E X2 + 2 E E XtXs COS {27rj(t - s)/N} . 

.t=0 t=l s=0 

For short memory data, the periodogram is an asymptotically unbiased but inconsistent esti- 
mator of /(•) that is commonly used as the basis of more sophisticated estimation procedures. 
The use of (6) for parameter estimation has been discussed in detail by Walker (1964, 1965) 
and Hannan (1973) under the assumption that the log spectrum integrates to zero. Hosoya 
(1974) added a second term of log {/(A)} to the integral to deal with more general processes. 

For the likelihood in equation (6) to have desirable asymptotic properties, it is assumed 
that the process is linear, and satisfies certain regularity conditions, thus ensuring good large 
sample properties of the likelihood based estimators. Note that (6) is an approximation to 
the log-likelihood of X based on the periodogram, but that (6) is not a likelihood for the 
periodogram. The approximation of the likelihood in equation (5) by equation (6), performs 
well when the process is Gaussian and the covariance of the time series is either rapidly 
decaying or exactly periodic. 

A Riemann approximation to the integral in equation (6) yields the discrete analogue 
and we could also adjust this to allow for more general processes: 

r, ^ T ( \ r, ^ 

er\(.S.e..ai) = - ^glogl/C^,)], (9) 

following Hosoya's proposal. By defining the vector C2j,2j+i{Aj, Bj)~^ , where Aj = Ao{ipj) 
and Bj = i?o(Vj)) and Sc* as the exact covariance of C. we may consider the exact log- 

likelihood, of the DFT of observed and Gaussian data via: 

2i^^^ {^,S,e,a^,)=-Nlog{27r)-log\'Sc\-C^'^cC^ (10) 

in direct analogue with equation (5), acknowledging finite sample effects of the DFT. The 
difference between this equation and the discrete Whittle likelihood is that it involves the 
exact covariance matrix, of the FFT coefficients. Analysis based on the likelihood of 



Io{^^) = Ali^^) + Bii^j) = - 
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the Fourier coefficients (in general) involves the inversion of the large, non-sparse covariance 
matrix, and is thus equally inefficient as the basis of likelihood procedures as equation (5). 

Having specified these various likelihood functions that could be used for inference, some 
justification must be used to motivate their usage. Equation (10) is a natural choice for 
analysis of seasonal time series, given the compression of the variables of the seasonal effects. 
We shall use the compression to approximate the likelihood more carefully, acknowledging 
large finite sample effects related to the compression explicitly. 

1.4 Contributions of the Paper 

We introduce an approximation to equation (10), and use this as the basis of a maximum 
likelihood procedure. We focus on the distribution and other properties of the periodogram, 
given an underlying SPP with sdf /(•). We focus on Gaussian processes, and do not consider 
here the non-Gaussian case. However, for other processes, such as those in Brillinger (1975), 
where asymptotic normality of the DFT holds, our distributional results are still valid. 

Specifically, we consider estimation of parameters of spectra with spectral poles away from 
frequency zero. We consider an adjustment to the standard DFT that simplifies the technical 
developments of this paper. A simple (but parameter dependent) modification of the choice 

of grid, conditional on a known spectral pole location, leads to simple approximations to the 
likelihood of the periodogram at a new set of frequencies spaced at a distance 0{N~^) apart. 
In particular 

1. Wc propose a new demodulated Whittle discrete likelihood for seasonal processes (sec- 
tions 2 & 3). Wc show that the proposed likelihood approximates the distribution of the 
discrete Fourier transform for any posited value of the true parameters (see Theorem 1). 
The key idea is to use a different orthogonal transformation of the data conditional on 
each fixed value of the location of the pole (specification of a compressed representation). 
This is a non-standard situation. 

2. To establish the properties of the likelihood we calculate the large finite sample distri- 
bution of the periodogram at the pole itself (Section 2.3). 

3. We bound the covariance of the demodulated periodogram at different frequencies spaced 
1/A^ apart (noted in Section 3), and note its asymptotically negligible contribution to the 
normalized log-likelihood. Furthermore, the choice of approximation to the likelihood is 
not everywhere continuous. However, we demonstrate (Section 3) that the discontinuities 
in the likelihood surface represent a negligible contribution for finite large samples. 

4. We prove consistency of the MLEs (see Theorem 2), and determine the large sample 
first order properties of the score and observed Fisher information (Theorem 3). 

5. We determine the asymptotic distribution of the score and observed Fisher information 
(see Theorem 4) and the asymptotic distribution of the MLEs (see Theorem 5). 

6. We give a large finite sample approximation to the distribution of the pole estimator 
(see Proposition 6). 

To derive the appropriate large sample theory, some care is required. It transpires that 
the score and Fisher information do not exhibit the usual large sample behaviour. Our results 



7 



are based on a Taylor expansion of the log-likelihood; we adopt the normalization of the 
observed Fisher information adopted by Sweeting (1980, 1992). We thus renormalize the 
observed Fisher information appropriately with a suitable power of N. The renormalized 
score and observed Fisher information converge in law to Gaussian random variables that 
are asymptotically uncorrelated. The distribution of ^ converges slowly to the asymptotic 
distribution, and so alternate finite large sample approximations are also given. 

These results establish a new large sample theory for seasonally persistent processes, and 

utilize the data-dependent transformation of the time-domain data that facilitate the compu- 
tation of the distribution of different random variables for each posited value of the pole, and 
appropriate normalisation techniques for the score and Fisher information when the data is 
modelled as highly compressed in the Fourier domain. 

1.5 Connections with Recent Work 

In connections with other related work, we distinguish between likelihood-based methods and 
semi-parametric methods for processes exhibiting seasonal persistence. Giraitis ct al. (2001) 
consider fully parametric models, and constrain the maximization over the location to a grid 
of frequencies spaced 0(A^~^) apart. Hidalgo and Soulier (2004) consider semi-parametric 
models, and the theoretical properties of the extended Geweke-Porter-Hudak estimator, basing 
their analysis on estimating the location of the singularity as the Fourier cocfRcicnt of the 
maximum periodogram value in a given frequency interval; in their simulation study, the true 
location of the singularity is aligned with the Fourier frequency grid. Hidalgo and Soulier 
(2004) evaluate the Fourier coefficients at the Fourier frequency grid, and restrict the estimate 
of the location of the pole to a grid of frequencies spaced 0{N~^) apart. Hidalgo (2005) used 
semi-parametric methods to estimate the location of the pole, as well as the long memory 
parameter. By using a two-step procedure he is able to develop large sample theory for the 
estimator of the singularity, whereas in contrast we focus on full likelihood methods. More 
recently, Whitcher (2004) used a wavelet packet analysis approach for estimation of seasonally 
persistent processes. 

In terms of asymptotic properties, our rate of convergence matches that of Giraitis et al. 
(2001). However, in addition, we obtain the large sample distributional results for the esti- 
mator of the pole, which they fail to do, having produced a different estimator. Similarly to 
Giraitis' et al., Beran and Gosh (2000) estimate the location of the pole using the coefficient 
which maximises the periodogram. Our estimator is again different although asymptotically 
equivalent with the same rate of convergence, and it has a determinable asymptotic, as well 
as large finite sample approximate, distribution. 

Our work also has a connection with, but is different in spirit from, hidden frequency 
estimation, in which the seasonal structure is modelled as deterministic, corresponding to a 
single sinusoid. In this case, the Fourier coefficient which maximizes the periodogram converges 
to the true coefficient with a faster rate than the convergence of the MLE of the pole. Such 
rates were improved by secondary analysis, and the corresponding analysis using data tapers, 
see for example Chen et al. (2000); Hannan (1973, 1986); v. Sachs (1993). Secondary analysis 
corresponds to partitioning the time series into several groups of data, and using regression 
to estimate the so-called hidden frequency. Thomson (1990) used multitaper methods to 
improve the detection of a set of hidden frequencies, and use least squares methods over a 
given bandwidth. Neither the model we use, nor our proposed inferential method, is equivalent 
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to the above mentioned procedures. Secondary analysis can be considered to 'zoom in' on 
local structure near the pole, and may be philosophically related to our procedure, but we 
implement full likelihood for a full set of Fourier coefficients. Conditionally for each fixed 
value for the pole, we calculate the distribution of a different set of random variables, but as 
each set is a linear and orthogonal transformation of the original data, and with a constant 
and equal Jacobian, this is appropriate. 

Finally, we note that the inferential issues are of importance beyond seasonally persistent 
processes. The inherent non-regularity arises due to a parameter dependent transformation of 
the time-domain data. Whenever the process is modelled using a suitable parametric linear 
transformation of the data that will give decomposition coefficients that are non-negligible only 
for a few sets of indices, our methods will be applicable with some minor modifications. In 
a more general setting we would write the variances of a set of basis coefficients as satisfying 
a power-law decay, and we refer to such processes as second order compressive processes. 
Power-law decay in a suitable basis is an relatively common phenomenon - see for example 
the discussion in Donoho (2006); Abramovich et al. (2006); Candes and Tao (2004) - and 
our developments will carry across to this setting if the compression is stochastic rather than 
deterministic, once the location and decay parameters have been incorporated in the arbitrary 
basis. Issues of alignment, and/or shift- variance, akin to results that arise for misspecified 
location of the pole, are very well-documented in other basis expansions (Coifman and Donoho, 
1995). Note that the equivalent to the decay parameter discussed by the aforementioned 
authors will he p = 1/(25). Only for 5 > 0.25 are we in their mode of decay, corresponding to 
extreme regimes of long memory behaviour. 



2 Distributional results for the Periodogram 



2.1 Lcirge Sample Properties 

The large sample properties of the periodogram of seasonally persistent processes were deter- 
mined in Olhede et al. (2004). We summarize and extend these results below; in particular 
we compute the statistical properties of the periodogram itself at the pole ^, as this specific 
Fourier coefficient will contribute substantively to the subsequent likelihood calculation. 

Theorem 1 in Olhede et al. (2004) gives the following result concerning the relative bias 
at frequency A, Bx,n{^,S), of the periodogram for all A G (—1/2, 1/2),^ G (0, 1/2), 



/(A) 



A = ^ 



This notation makes explicit the dependence of the relative bias on and A^. For frequencies 
ifl^ = k/N, k & {0, . . . , M}, we have, for large and a fixed value of ^, with (p^. / ^, 



2 r 

" J— I 



sm.{u/2 - -kcmH.^k)} 



25 



du + o(l). 



(11) 



where cn{^-, ^k) — ^i^k ~ denotes N times the distance between the k^^ Fourier frequency 
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and the pole at ^. For the case ipj^ = ^, the large sample value of -B^,Ar (C) ^) is given in Lemma 
2.1 in Section 2.4. 



For the second order moment properties, let 

^ sin{u/2 - 7rcAf(^, ipk)] sin{n/2 - 7rcjv(g, ^pi)] 
^'^'^"^^ ' {u- 27rc^r(e, </:'.)}{^ - 27rc^(C, ^i)} 



2 



2tt 



25 

\cN{^,'PkyN{i,'Pl)f du + 0{l). 



u 

Then, for ^olv^j), ^o(</'j) from (7), Olhede et al. (2004) gives 

E{^o(¥'fe)^o(¥'i)} = E{Bo(<^fe)Bo(¥'^)} = {V^„ ^ ,n{^. 5)/2 + o{l)}y/ fMf{Vi) 
E{^o(¥'fe)5o(<Pi)} = E{5o(<p,)Ao(¥.^)} = o(l)/fMf(^ 

= o(n^') if cjv(C,</^fc), c^(C,9'i) = 0(l). 

These results specify the large sample first and second order structure of the periodogram. 
We now extend these results to the demodulated periodogram described in section 2.2. Note 
that a direct implication of these results is that the distribution of the periodogram is highly 
dependent on the distances between the pole ^ and the Fourier frequencies {(^fc}. 



2.2 The Demodulated Discrete Fourier Transformation 

The Discrete Fourier Transform of {Xt} is not constrained to be evaluated at {•Pk), but in 
fact any 0{N^^) grid could be considered. This fact leads us to consider demodulation^ a grid 
realignment technique, which for any fixed value of ^ produces a new grid aligned with the 
pole. Demodulation ensures that the large sample behaviour of the demodulated periodogram 
is similar to that of the periodogram of a standard long memory process (where ^ = 0). 
Specifically, the large sample bias is the same but the distribution of the periodogram is xi 
rather than a sum of unequally weighted xf random variables (see Hurvich and Beltrao, 1993; 
Olhede et al., 2004, p. 621). 

The Demodulated Discrete Fourier Transform (DDFT) or offset DFT (Pei and Ding, 2004) 
of a sample of size N from time series {Xt} with demodulation via a fixed frequency A is 
denoted Zx, and is defined for Fourier frequency ipj by 

N-l 

Zxivj) = 7= E Xte-2-(Vi+^)* = Axicpj) - iBxiifj), j = 0,...,M. (12) 

The demodulated periodogram at frequency (pj with demodulation via A is denoted Ix{(pj), 
and is defined via the ordinary periodogram Iq by 

hiv^) = loicpj + A) = \Zx{Vj)\^ = A2(^ .) + S2(^ .). 

Hence Ix{ipj) is simply the periodogram evaluated at frequency (pj + A, or Io{iPj + A). We 
will consider evaluating this expression at arbitrary frequency (p. We define Ca;2j,2j+i = 
{Axj, Bxj)~^ = {Ax{(pj), Bx{(Pj)}~^ , in analogue to C in equation (10). For Gaussian data X 
we then find: 

Cx = {Ax,o, Bx,o, . . . Ax,M, Bx,Mf = M (0, ScJ (13) 
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Note that due to the demodulation, -Ba,o 7^ in general, unlike the imaginary component 
of the DFT at frequency zero. To efficiently formulate the likelihood, we need to explicitly 
consider the computation of ^Cx^ the covariance of the DDFT coefficients. 



Afc = >^k{j),N[^) = H rr = ? ^7 = ? + -l^- (14) 



2.3 Extending the Olhede et al. (2004) result 

The results in Olhede et al. (2004) do not cover the case of demodulation, and to enable 
calculation of the new likelihood, further results are required. For example Ar(^, S) needs to 
be explicitly determined. To minimize the bias in the demodulated periodogram, and simplify 
the covariance structure, we shift the Fourier grid so that the closest Fourier frcqiicncy to 
the pole in the original grid coincides exactly with the pole in the demodulated version. 
For a pole at ^, we denote by jo,N{0 = where [x] indicates the nearest integer to 

X. We furthermore let CAf(C, (/'jq iv(?)) ~ Jo,iv(0 ~ specify A = Xd,n{0 (12) as 

Xd,n{0 — ~'^Jv(C) V'jo jv(^))/"^" "^^^ approach introduces a new grid of frequencies, namely 

N ^ N ^ N 

We exclude Fourier frequencies and 1/2, and taking j = 1, . . . , M — 1 we have k = k{j) = 
j - jo,N{0 = Ji, . . . , J2 = -jo,N{0, ■■■,M - jo,NiO- For example, if AT = 16 and ^ = 0.15, 
then Ad,i6(0.15) = 0.025, [N$,] = 2, Ji = -1 and J2 = 5. Note that for k{j2) > k{ji) ^ 0, 
then 

so that the covariance properties of the DDFT can be easily determined. 

Under this demodulation, the DDFT yields the original periodogram Iq evaluated at fre- 
quencies Afe = Ajt(j) = C + (i ~ io.AT (0)7-^5 ^rid takes the form 

N-l 

((^,.) = Zo (Afe(,)) = ^ 5^ Xte-2-^'=*, fc = Ji,...,J2, (15) 

^-'^ t=o 

so that, for k = Ji,..., J2, Ix^i^j) = -^o(-^fc) = -^o(C + k/N). The DDFT can be computed 
efficiently by applying the DFT to the new series defined for t = 0, . . . , A?^ — 1, by 

Yt = exp {— 27rzA£)t}. Demodulation both simplifies the mathematical calculations consid- 
erably, and improves estimation of the persistence parameter 5. Naturally the operation is 
very straightforward to implement. The parameter dependent choice of {\k} will need careful 
analysis when deriving properties of the parameter estimators. 



2.4 Expectation of the Periodogram at the Pole 

The result in (11) gives the relative bias of periodogram. The expectation of the periodogram 
is given in the following Lemma. 

Lemma 2.1 The expected value of the periodogram evaluated at the pole ^, after 
demodulation by ^, is 

E{Iom = (27riV)2'5{-2/t(Or(-l-25)}cos{7r(l/2 + (5)}7r-i + o(l) 
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Proof: See Appendix A.l. ■ 



3 Asymptotic Properties of the Likelihood and Estimators 

In this section we utilize demodulation, and the large sample approximations described above, 
to present three theorems that characterize the asymptotic behaviour of the likelihood, the 
corresponding MLEs for {^,S) and the associated Fisher information to obtain their large 
sample properties. Specifically, we establish A^-consistency for the estimator of the location 
of the pole, thus matching the result of Giraitis et al. (2001). 

3.1 Lcirge-sample Likelihood Approximation 

For a periodogram demodulated to align the Fourier grid with pole ^, we have the following 
asymptotic result. 



Theorem 1 Approximating the Likelihood Function. 

For a Gaussian series from a periodic long memory model as described by (3), where f\-) is 
twice partially differentiable with respect to {^,S), the log-likelihood of the discrete Fourier 
transform can be approximated by 

i{i,5,e,al) = ^ {logr?, -r//o(^ + i/A^)} (16) 
j=Ji 



accurate to o{N), where 



I -p^TOVO} 



for < 5 < 0.5, where T{74} is the indicator function for event A, 

/j^/^(A,) = /^(^ + i/A^) 
and B^{^,5) is the asymptotic relative bias given by Lemma 2.1. 

Proof: See the Appendices A.2-A.4. ■ 



Note I : The approximation to the likelihood is equivalent to that of independent exponential 
random variables with rate parameters r]j that depend on j and 5 but not on ^. In equation 
(17), the function -Bg(^, 5) appropriately scales the periodogram contribution from the Fourier 
frequency aligned with ^. S^(^,5) is monotonically increasing in 5, with lima;_>o B^(^, x) = 1, 
and B^{$,,5) is bounded away from zero. As the function is monotonic the derivatives of 
B^{^,S) are also bounded away from zero. If /^(•) is also bounded away from zero, then the 
log likelihood is bounded in ^ and S. Thus it is possible to find efficiently the MLEs of ( and 
6 numerically. 
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Note II : This likelihood is not differentiable with respect to ^ at all values of ^; although 
Iq{^ + j /N) is available in simple form, the dependence of Ji = — Jo,Af(0 a-i^d J2 = M — jo,jv(^) 
on ^ renders the overall function discontinuous. However, the discontinuities are 0(1) in 
magnitude, and the log likelihood is uniformly at least 0{N), so in fact the discontinuities 
are negligible, but motivate us to look, in standard fashion, at the AT-standardized likelihood 
function ^(^, 5^ 0, a^)/N. See Appendix A.4 for further details. 

Note III : The formulation in Theorem 1 summarizes the data in the frequency domain 
via the demodulated periodogram for a given ^. We avoid the introduction of the substantial 
bias and covariance terms found in Olhede et al. (2004), as the demodulated periodogram is 
perfectly aligned with this singularity. When other demodulations arc chosen the likelihood 
cannot be approximated in such a fashion. Even for frequencies of sufficient distance from 
any irregular behaviour, the results of Olhede et al. (2004) cannot be applied directly, and 
to find the approximate Whittle likelihood we additionally need to make assumptions about 
the spectral density function, and its smoothness (see Dzhamparidze and Yaglom (1983) and 
Taniguchi and Kakizawa (2000)). 

Note IV : The result differs with that of Hurvich and Beltrao (1993) in a number of 
ways. The ordinates subscripted j and —j in the DFT are no longer complex conjugates, 
and the likelihood at evaluated at Xj is now approximately X2 (rather than a mixture of two 
different terms) even for those coefficients closest to the pole. Strictly, the definition for 
rjj in equation (17) has an additional term V\.^Xi,^]y{S,,S) for j,k G Z, but these terms can 
be bounded appropriately, and thus contribute in a negligible fashion. The bias at the pole 
reported in Hurvich and Beltrao (1993) is (identically) present in our formulation, but is o(N), 
and is thus subsumed into the final term - see Hurvich et al. (1998) for relevant supporting 
arguments. 



3.2 Existence and Consistency of the ML estimators 

We now use the results of the previous section to construct likelihood-based estimators of 
^ and 6 and establish their properties. The following theorem establishes the existence and 
consistency of the ML estimators derived from the likelihood in Theorem 1. 

Theorem 2 Existence and Consistency 

For the likelihood of Theorem 1, the ML estimators of ^ and 5, ^ and 5, exist and are 
consistent, with convergence rates N and N^^"^ respectively. 

Proof: See Appendix A.5. ■ 

The A'^-consistency of ^ matches the convergence rate of Giraitis et al. (2001). It is unusual 
to find superconsistent estimators in likelihood based procedures. An intuitive understanding 
of the rate can be found in the time domain. As we collect N^* full periods of the data the 
periodicity of the data is determined to an accuracy of 0(A^~^). The reason why this rate is 
achieved is that the log-likelihood is varying 0(A'^^/^) (see proposition 14) over distances in ^ of 
0{N^^) near the value ^ = However, the convergence rate is different to that of Chen et al. 
(2000) . The latter model the seasonality as a deterministic seasonal component embedded in 
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stationary noise. In Chen et al. a regression model is employed to estimate the amplitude 

and locations of the seasonality, and a rate of N^"^/"^ rather than A^^^ is achieved. Wc employ 
a different model and hence do not expect the same convergence rates as is achieved by Chen 
et al.. 



3.3 Properties of The Fisher Information Matrix 



Theorem 3 The Fisher Information. 

For a series from a periodic long memory model as described by ( 3), for large N , the 
components of the Fisher information 



(N) (N) 
\ -^€,5 -^6,6 I 



are given by 

= ^s,sN + o{N), 



where J^s,s, J^^^g and J^^^^ are constants independent of N but are functions of the true values 
of ^ and 6. 



Proof: See Appendices A.6 and A. 7. ■ 



For a full analysis in a regular ML setting, the second order properties of the MLEs can in 
a general setting be deduced from the above quantities. Large sample properties, specifically 
consistency and asymptotic variance, may be considered via a Taylor expansion of the log- 
likelihood, see for example Cheng and Taylor (1995). However, we note that we are not 
in a standard setting; even if we may expand the log-likelihood near the true value of the 
parameter, because of the non-standard behaviour of the derivatives of the log-likelihood, the 
observed Fisher information does not converge to a diagonal matrix with constant entries, 
but rather the (appropriately standardized) observed Fisher information for ^ converges to 
a random variable with order one variance. We will discuss the interpretation of the Fisher 
information in this context, in the appendix. For the derivatives involving the location of the 
pole, extra terms of magnitude N are introduced and thus the variance of the observed Fisher 
information in ^ is 0(A^^). The magnitude of the variance of the observed Fisher information 
in ^ implies that a standardization of the random variable must be employed that results in 
a negligible expectation of the restandardized random variable. We also therefore discuss the 
large sample theory of the observed Fisher information. 



3.4 The Asymptotic Properties of the MLEs 



We now consider the use of the Fisher information to determine the asymptotic variance. 
Consider a Taylor expansion of the score near the true value •j/'* of the parameters V = i^, ^) 
evaluated at the MLE i/). We denote the observed Fisher information by Fn (V'); s^nd let V' 
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lie between -0 and We denote by ^(■0) the score in ip, noting that the score is weh defined 
if the log-Hkehhood is evaluated ignoring the ^ dependence of Ji and J2, see section A-4: 



m =\ • (18) 




Then using a first-order expansion of the log likelihood in the usual way for N sufficiently 
large, we have the (vector) score function: 

i(V') = i(V*)--F'iv(V'')(^-V'*) ^ Fjv(V'')(V'-'0*) = W-^W- 

Thus the difference between and when appropriately scaled by the random matrix 
Fn{iP') corresponds to the value of the score at -0* in the usual fashion. The statistical 
properties of Fi^{il)') are not straightforward in this non-regular problem, and require further 
investigation. Following Sweeting (1992), we define a suitable standardization matrix Bj<; and 
the standardized observed Fisher information by 



Bn = 



Ar5/2 



NJ^5,S ) 



Wn = B-'/^FnB-^/\ (19) 



as the large sample properties of Wj^ arc tractable, and their determination is an important 
step to finding the large sample properties of the MLE. Specifically, we let 

b-'/'fn{^p')b-'/'b]/\^ - r) = B-'/'{i{r) - im, 

so that 

WN{tl^')B]l\ii; - V*) = B-^'H{il,^) = kNitp*)- 

The latter expression defines the standardized score kj\f(-). See the Appendix for a full discus- 
sion of these quantities. Note that -Bat is the large N approximation to the Fisher information 
matrix for S and corresponds to an appropriate order normalisation for ^, thus Wn {tp) is the 
observed Fisher information renormalized by Bn- 

We note that for N large enough, the expected value of WN{tp*) is the identity matrix 
for the S entry, but the expectation of the first entry of Wiv(V'*) is o(l) whilst the variance 
of the first entry is 0(1). In a standard setting the expectation is 0(1) and the variance o(l). 



Theorem 4 Distribution of the Score and Observed Fisher Information. 

For the likelihood of Theorem 1 the standardized score kN^'ip*) and the standardized 
Observed Fisher information matrix WN{tp*) asymptotically have the following properties: 

fciv(V'*) ^ k, Wn^ ^ W, (20) 
where the entries of k = {ki,k2)~^ and W are uncorrelated and 

Wu - A/'(0,87rVl5) , Wu = 0, W22 = 1 

(21) 

ki ~ Ar(0,7rV3), k2 ~ AT (0,1). 



Proof: An outline of the proof given in the Appendix, see Proposition 6, Section A.8.3 and 
Proposition 9. ■ 
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Theorem 5 Distribution of the MLE. 

For the likelihood of Theorem 1, the ML estimators of and 6, ^ and 6 have distributions 
that for large sample approximately take the form: 



where C is distributed according to the standard Cauchy distribution, and 



VN(5-S*)^AAf(o,N[j^^gf^ (22) 



An estimator of the asymptotic variance is form,ed via J"^^^ = ^s^\Cj^)- The forms of 
^sVi^i^) J^s,s{^,^) 0-1"^ given in the Appendix. 

Proof: An outline proof is given in the Appendix, see Proposition 9 and section A. 8.1. 



Note : Giraitis et al. (2001) do not find the hmiting distribution of their estimator, $,q, of ^. 
They note that this is an artefact of the maximization over the specified grid. This constraint 
is not enforced in our approach. Note that E(|f - ^g-]) = O (N'^) but that N\^ - is not 
constrained to be zero or even to have a tractable distribution, this result is demonstrated 
empirically in the simulations. The convergence to the Cauchy for extreme values of 6 is quite 
slow, we provide, in the Appendix, a second approximation to the distribution of the renor- 
malized estimator of the pole, via more carefully approximating the dominant contributions 
to the mean and variance of the numerator and denominator that define the random variable 
the estimator follows. 

To compare the two large sample and asymptotic forms of the distributions, wc refer 
to Figure 2 (a) and (b). As 6 increases in magnitude it takes longer for the large sample 
approximation to be close the asymptotic distribution, as is obvious from these plots. For a 
list of critical values of the distribution see Table 10. 

The likelihood of the data changes in magnitude dramatically depending on the value 
of ^ and its alignment with the grid of frequencies at which the periodogram is evaluated, 
determination of the best value of ^ is pivotal for characterizing the system, and must be the 
first stage of any analysis. For completeness we now discuss the estimation of the additional 
parameters, i.e. 6 and a^. 



3.5 White Noise Vetriance and Nuisance Pctrameters 



We now consider the estimation of the white noise component and regular spectral component. 
We model the sdf parametrically by 

^ ' |2cos(27rA) -2cos(27rO| |2cos (27rA) - 2cos (27rO| ^ 
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Differentiating £{^, S, 6, a^) from equation (16) with respect to we obtain that: 



^ ^ bf^ |2cos(27rA)-2cos(27rg)|^^ 
Thus it follows that (Taylor expanding the other MLEs and using their rates of convergence) : 

for Ji , J2 sufficiently large. This follows as the estimators of the other parameters of the sdf 
are nearly unbiased for sufficiently large values of N. We note that the covariance of with 
5 and ^ can be treated analogously to the results deriving the covariance of ^ and 5 or using 
standard results for 5 and/or 6 and a^. Denoting by 

MA) = MA; 0) M(A; 9) = and M.(A; 0) = 

we determine that 

and 



3=Jl 



which is 0{N). Thus we find that 

'2^ ^ J2 



|MA,)|2 ^ ' \h{\,r 



\ dH{^,6,e,ai) \ _ ^ M(A,;0)M(A,;0) 



i=Ji 



▼ AT „■ 7, 



This then provides the required score equations. Furthermore using regular ML theory, we 
have that: 

N (e - e) M , (27) 



where V contains the Fisher information, and N ^Vjv V- This allows us to fit the more 
general class of GARMA rather than Gegenbauer models, see Gray et al. (1989). 
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4 Examples 



4.1 Analysis of Simulated Data 

For our simulation studies we examine the performance of our adjusted Whittle likelihood- 
based estimators in comparison with those derived from the classic Whittle likelihood. Data 

were simulated in the time domain using the covariance recursion formulae given in Lapsa 
(1997) for a seasonally persistent Gegenbauer process with ^ = 1/7 corresponding to an 
weekly cycle in daily data, and S = 0.3,0.4 and 0.45. We generate 2000 replicate series of 
lengths A" = 1024, 2048, 4096 and 8192. Tables 1 to 4 demonstrate the performance of the ML 
estimators of ^ and 5, in terms of bias, variance, and the relative efficiency (c^/fp^/) of our 
estimators compared with those derived using the classic Whittle likelihood. For = 1024 
the demodulated estimator significantly improves the bias present in the Whittle estimators 
for both ^ and S. As N increases and the spacing in the Fourier grid decreases, both estimators 
for ^ perform well, however, the bias in the Whittle estimator for 6 is still present even for 
N = 8192, and becomes more severe as 6 increases. 

To illustrate the problems with the Whittle likelihood for smaller A^ and large 6, Figure 
1 shows the mean conditional likelihoods evaluated at the ML estimates. The improvement 
gained by demodulation is evident, the scale of the improvement will be dependent on the 
distance of the pole from the Fourier grid. The plots also demonstrate the discontinuities in 
the likelihood for ^, as discussed in Section 3. 



4.2 U.S. Weekly Crude Oil Imports 

The first real data set comprises 756 observations of U.S. Weekly Crude Oil Imports (in 
millions of barrels per day) from 6th December 1991 to 26th May 2006, downloaded from 

http : //tonto . eia . doe . gov/ dnav/pet/hist/wcrimus2w . htm. 

The data were detrended using a linear trend, and are displayed in Figure 3. Periodic be- 
haviour is evident in the raw detrended data. 

For these data, we fitted a low order Gegenbauer-ARMA (GARMA) model; the process 
{Xt} is represented as the unique stationary solution of 

cf>iB)Xt = e{B)Gt 

where B is the backshift operator, and polynomial operators ^ and 6 define an ARMA process 
in the usual way, and where {G*} is a pure Gegenbauer process as defined by the sdf in 
equation (3) with h the identity function. We consider at most ARMA(1,1) models, so that 
(t){z) = l — 4>z and 6{z) = l + 9z where, under the assumptions of stationarity and invertibility, 
1^1, 1^1 < 1. Using standard results, the parametric sdf that we consider takes the form 

^ a| (l + gcos(27rA)+g^) 

|1 - 2e-2i'^^cos(2^^) + 6-4^'^^!^'^' (1 - 0cos(27rA) + 4>'^) 

In our notation, a GARMA(1,1) model has both (p and 6 non-zero; for GARMA(1,0), 6 = 0, 
whereas for GARMA(0,1), 4> = 0. GARMA(0,0) corresponds to the Gegenbauer model with 
no ARMA component. 
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Results : Using numerical methods (the optim function in R), each model was fitted using 
our demodulation approach and also using the standard Whittle likelihood, and the results 
compared using BIC. The results are presented in Table 5. The best model is the overall is 
the GARMA(0,1) model fitted under demodulation, indicating that the use of a non-standard 
Fourier grid can improve the quality of fit, that is, the fit of the model under the standard 
derivation (we term this the standard Whittle model) is inferior. 

For the selected model, the parameter estimates and approximate standard errors are 
displayed in Table 6. 

4.3 Fcirallon temperature data 

The second real data set is a surface temperature series for the shore station at the Farallon 
Islands, California, United States. Daily temperature data were obtained from the ftp site 

ftp : //ccswebl . ucsd . edu/shore/CURRENT_DATA/Temperature/ 

and formed monthly averages for the period 1960-1996; missing daily quantities were omitted 
from the monthly averages, whole missing months (there were six in the period of study) were 
imputed by taking averages for that calendar month across the 37 years of study. In total 
there were 444 monthly average observations. 

We analyze these data in two ways to compare the Whittle maximum likelihood estimates 
with our demodulation approach. First, we take the 444 data in their entirety, then we perform 
a second analysis using only the last 440 observations. As the expected annual periodicity 
would induce a pole in the spectrum at frequency 1/12, and 12 divides 444, the pole will lie 
at a Fourier frequency when the whole data set is analyzed. However 12 does not divide 440, 
so for the second analysis, the pole will not lie at a Fourier frequency. 

Results : Each of the low order GARMA models were fitted and compared using BIC. The 
two cases, N = 444 and A^" = 440 were analyzed. The results are presented in Table 7, and 
the raw time series as well as fitted models are plotted in Figure 4. The model with the 
highest BIC is, in both cases, the GARMA(1,0), but for the two values of N, the different 
approaches are favoured in the two cases. For N = 444, the classic Whittle approach yields 
a higher log-likelihood, but for A'^ = 440 the demodulated model performs better, yielding a 
higher log-likelihood. Parameter estimates from the model are presented in Table 6 for the 
two values of N. 

This data set and analysis illustrates perfectly another of the advantages of using the 
demodulated likelihood with the bias-adjustment procedured outlined in Section 3 and The- 
orem 1. In the classic Whittle likelihood, when a Fourier frequency exactly coincides with 
the pole, the on-the-pole likelihood contribution erases the contribution of that periodogram 
element. Note first that the omission of a data point from the likelihood causes the likelihood 
to increase (that is, become less negative) and this explains the higher likelihood value for the 
classic Whittle likelihood. For the Farallon data set, this omission also leads the remaining 
periodogram appearing as if it corresponded to a short memory process, hence the low esti- 
mated value of S that is essentially no different from zero. The conclusion of such an analysis 
would be that the underlying process has a pure seasonality at the estimated ^, in this case 
^ = 1/12, and the seasonally differenced series was essentially a white noise process. However, 
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seasonal first differencing of the original series leads to a new scries that is not a white noise 
process; in fact the differenced series appears over-differenced. Hence, such a model does not 
provide an adequate explanation of the data. When N is changed to 440, inferences using the 
classic Whittle method change dramatically. Note, however, that for the new demodulated 
likelihood, parameters estimates are closely comparable across different values of N. 

4.4 Southern Oscillation Index 

We consider the Southern Oscillation Index (SOI) data analyzed by, for example, Huerta 
and West (1999). The version of the data we consider has N = 1668, the data and fitted 
spectrum are presented in Figure 5. For this large sample size, the difference between the two 
approaches is minimal; the BIG values are negligibly different, and the estimates and estimated 
95 % intervals are presented in Table 9. In this case, the estimates of the pole position obtained 
from the likelihood approaches are markedly different from the naive estimate obtained by 
taking the ordinate corresponding to the maximum of the periodogram (shown as a dotted 
line in Figure 5(b)). 

5 Implications for Non-Likelihood Approaches 

The results derived in previous sections focus explicitly on likelihood based procedures. How- 
ever, they motivate the use of adjusted versions of currently existing estimation procedures 
that improve the performance of those procedures when applied to seasonally persistent series. 
Given the special role of the location of the singularity when formulating the likelihood, we 
propose a series of procedures that profit on the simplified distribution that arises by using 
the demodulation by the (estimated) pole. 

5.1 Profile Likelihood 

The profile likelihood of ^ is a pseudo-likelihood function given, for each possible by 

^^(O = max£^(e,<5,0). (28) 

d,tf\i, 

iff (^) may be maximized, yielding a maximum pseudo-likelihood (MPL) estimate of ^, denoted 
^Pj.. Then the values of 5 and which maximize the conditional likelihood given ^ = ^pj., 
arc computed. Specifically, the ML estimate of 6 based on the demodulated likelihood for all 
values of ^ G (0, 1/2) is computed. Finally, the estimate S = S{^p^) based on demodulation at 
^Pr is obtained. 

In many cases the MPL and ML estimators agree closely; in given applications, the MPL 
approach may potentially be more readily implemented. Note that some care must in gen- 
erality be used when applying profile likelihood estimation, (see, for example, Berger et al. 
(1999)), but given the rate of convergence of the MLE of ^ such problems are unlikely to arise. 
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5.2 A Semi-Pcirametric Analysis: The Geweke-Porter-Hudcik Estimator 



The Geweke-Porter-Hudak (GPH) procedure (Gcweke and Porter-Hudak, 1983) implements 
semiparametric estimation of 5 for the case ^ = which can be adapted to incorporate 
a demodulation procedure and profile marginalization. The GPH procedure examines the 
behaviour of the periodogram on the log scale near frequency zero, and estimates the long 
memory parameter 5 using ordinary least squares and a linear regression. We omit full details 
for brevity, but outline a possible adjustment based on a recent formulation given by Hidalgo 
and Soulier (2004). It is sufficient to say that the GPH procedure relies on distributional 
properties of the periodogram near the presumed pole. 

In light of the results of earlier sections of this paper, to obtain an improved estimate of 
the S using GPH we could take two alternative approaches. First, we could adjust the GPH 
to the demodulated setting, taking the distribution of the periodogram at the singularity fully 
into account. Alternatively, we could utilize large sample arguments and consider the score 
function. We consider frequencies indexed j whose likelihood contributions are influenced by 
the singularity. The log-likelihood then has three parameters; ^, S, and C = p ^, (5, 0, cr^) 
which can be treated as a constant, if the j included are chosen judiciously. 

Hidalgo and Soulier (2004, p. 58-59) consider the modified GPH by introducing the fol- 
lowing notation: 

m m 

5 (a;) = - log (1 1 - e-|) 5m = - ^ ^ C^^^k) 4 = 2 {ff i'^^'Pk) ' 9m? 

k=l k=l 

where m periodogram ordinates on either side of the pole are included in the regression. They 
also define (with slightly different notation) ak = s^{g{27:ipf.) —gm}^ ^■^d define the estimator 
to be: 

^GPH= Yl OiklogUo(<^fe+?5)}- (29) 

l<|A:|<m 

Hidalgo and Soulier note that the asymptotic distribution of is not known, and that es- 
timation of the pole is an open problem. In their simulation studies, 5000 replications of 
series length 256, 512 and 1024 are used, with = 1/4, and thus there is grid alignment with 
the pole. They implement the GPH procedure, assuming ^5 is correct on the demodulated 
periodogram, excluding the contribution from the pole itself. Notice also that they chose 
m = N/A, m = N/8 and m = N/16, i.e m = 0{N). 

Having found the distribution of the periodogram at the pole in Lemma 2.1, we can adjust 
the GPH estimator using a similar profile likelihood approach. Assume that ^ is known, and 
consider such that 

-^o(Afe) |Afe|^'^ 2 



X2- 



As ^ is known we are on the grid, and Bx^.,n{(,i ^) = 1 + 0{k~^). Thus we may ignore the 
contributions of the 0{k^^) term, and omit this from the procedure as the terms sum to a 
negligible contribution. Based on these values of k, least squares is then used to estimate 
S. This requires knowledge of ^; note that Hidalgo and Soulier estimate ^ as the Fourier 
frequency at which the periodogram is maximized, and therefore are restricted to an 0{N~^) 
grid. In contrast, for any ^, we demodulate the periodogram by ^ giving 



log{/o(Aifc(0)} - 2<51og(|Afe(0|) - log(C) ~ log(x 



I), 
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where Afe is calculated for the specified ^, not necessarily on the Fourier grid; in practice it 
is straightforward to use a finer grid over which to do a systematic search. More generally 
we may allow ^ continuously across the interval (0, 1/2), and to use numerical routines, and 
choose ^ to minimize the residual sum of squares after a least squares fit. Not that we can 
approximate the distribution of the log periodogram accordingly only if we demodulate, as 
otherwise the distribution is shifted in location by a constant depending on ^. In this case the 
correlation between the periodogram at frequencies spaced apart is non-negligible, thus 
necessitating usage of weighted least squares. 



6 Discussion 



This paper has illustrated the inherent problems with seasonally persistent processes and 
approximation based on the periodogram. We have demonstrated that realigning the grid of 
frequencies at which the periodogram is evaluated will simplify the distributional properties 
and enables us to specify a useful approximation to a likelihood function. Analysis of seasonal 
persistence will usually be based on frequency domain descriptions. For the usual Fourier grid, 
the distributional properties of the periodogram are generally not useful for SPPs, if given 
by previously derived theory Olhede et al. (2004). This paper shows how a small technical 
adjustment to the DFT to the DDFT alters the distributional properties substantially, making 
analytic investigation of the properties of the MLEs possible. The theoretical and practical 
utility of this adjustment is apparently under- appreciated in the literature. Potentially, even 
for short memory models (with bounded but highly peaked spectra) for moderate values of 
N, there will be an advantage in demodulation. 

In this paper, attempts have been made to fill the gaps of current theory. To avoid the 
problems associated with the location of the singularity, Giraitis et al. (2001) constrained the 
maximization of the Whittle likelihood to a set of frequencies spaced 0(A^~^) apart, where the 
likelihood performs well under the assumption that the true value of location of the singularity 
is constrained to this set. Their important result states that ^ — ^* = Op (A^^^). In fact, this 
is ensured (informally) by picking a Fourier frequency a distance C/N from the singularity, 
hence not even necessarily the closest Fourier frequency. In contrast, we have studied the 
sensitivity of the likelihood of the periodogram to 0(A^^^) perturbations in ^, and found that 
the estimate of 6 for large finite sample sizes is very sensitive to such variation, thus clarifying 
that despite the very rapid convergence of ( to ^* the potential misalignment of the Fourier 
grid with the unknown ^* must be acknowledged. We also derive the large sample form of 
the distribution of 6 and where the latter when re-normalised appropriately has a scaled 
Cauchy distribution. 

Our results relate to frequency domain based analysis at some grid of frequencies. For 
processes with absolutely convergent autocovariance sequences for large samples, no gain is 
made by a particular choice of Fourier domain gridding, however for processes with seasonal 
persistence it is of fundamental importance to chose the correct grid alignment, even in large 
samples, as this simplifies the distributional results substantively. 

Simulated examples show the superiority of our approach in finite sample situations. Fur- 
thermore, the methodology has the philosophical advantage of acknowledging the estimation 

of ^. While other methods do well asymptotically for estimation of the long memory parame- 
ter, it is worth noting that for any fixed (maybe large) sample-size, improvements can usually 
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be found by explicitly eonsidering the estimation of ^ separately. The profile likelihood meth- 
ods can be simply employed in the extended GPH estimator discussed by Hidalgo and Soulier 
(2004), extending the ideas to semi-parametric models, and facilitating a tractable analysis. 
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Table 1: Demodulated ML estimates of ^ 



AT 
IN 





bias (X lU j 


sd(xlO ) 




95% interval 


1024 


0.30 


0.8125 


1.0218 


0.7793 


(0.1406, 


0.1453) 


1024 


0.40 


1.2402 


0.5829 


0.8262 


(0.1416, 


0.1442) 


1024 


0.45 


0.6699 


0.3574 


0.6424 


(0.1421, 


0.1435) 


2048 


0.30 


-4.4170 


0.5355 


0.7461 


(0.1414, 


0.1439) 


z04o 
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0.3U11 


0.6511 


(0.1422, 


0.1435) 


2048 
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0.7178 


0.1958 


0.5286 
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0.1433) 
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0.30 


1.1035 


0.2770 


0.9237 


(0.1422, 


0.1436) 
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0.40 


0.4150 


0.1459 
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0.1432) 
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-0.5254 


0.0941 


1.0443 
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0.2031 
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0.1433) 
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0.2851 


0.1450 


0.9986 


(0.1426, 


0.1432) 
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0.1254 


0.0722 


1.0283 


(0.1426, 


0.1431) 



Table 2: Whittle estimates of ^ 
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sd(xl0-3) 


95% interval 
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1.1575 


(0.1406,0.1455) 


1024 


0.40 
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(0.1426,0.1433) 
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0.45 


-3.4389 


0.0920 


(0.1426,0.1431) 
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0.30 


-0.1842 
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(0.1424,0.1433) 
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0.40 


-0.2633 


0.1451 


(0.1426,0.1432) 
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0.45 


-1.1890 


0.0712 


(0.1426,0.1431) 
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Tab 


c 3: Demodulated ML estimates for S 
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0.4499 


-0.8000 


1.2573 


0.4348 


(0.4246,0.4736) 


4096 


0.30 


0.3001 


1.3333 


0.9528 


0.9523 


(0.2818,0.3200) 


4096 


0.40 


0.4003 


2.9515 


0.9475 


0.8964 


(0.3812,0.4188) 


4096 


0.45 


0.4505 


5.0612 


0.9173 


0.7909 


(0.4316,0.4663) 


8192 


0.30 


0.3001 


1.0872 


0.5028 


1.0068 


(0.2912,0.3124) 


8192 


0.40 


0.3997 


-2.5643 


0.4655 


0.9099 


(0.3906,0.4088) 
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0.45 


0.4497 


-3.0083 


0.4056 


0.9313 


(0.4414,0.4576) 



Tab] 


e 4: Whittle estimates for 5 


N 


S 


mean 


bias (xlO-*^) 


sd(xl0-2) 


95% interval 


1024 


0.30 


0.3004 


4.0909 


2.0630 


(0.2606,0.3436) 


1024 


0.40 


0.4076 


76.1111 


2.2083 


(0.3636,0.4505) 


1024 


0.45 


0.4677 


176.915 


2.2807 


(0.4209,0.4997) 
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0.30 


0.3015 


14.9899 


1.4696 


(0.2717,0.3293) 


2048 


0.40 


0.4077 


76.8081 


1.6748 


(0.3737,0.4414) 


2048 


0.45 


0.4680 


180.012 


1.9068 


(0.4324,0.4997) 


4096 


0.30 


0.3004 


4.0204 


0.9764 


(0.2816,0.3204) 


4096 


0.40 


0.4018 


17.5306 


1.0008 


(0.3816,0.4205) 


4096 


0.45 


0.4537 


36.8571 


1.0285 


(0.4337,0.4745) 


8192 


0.30 


0.3003 


3.3474 


0.5011 


(0.2874,0.3133) 


8192 


0.40 


0.4009 


9.3895 


0.4880 


(0.3900,0.4131) 


8192 


0.45 


0.4522 


21.9531 


0.4203 


(0.4401,0.4651) 
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Table 5: BIG values for U.S. Petroleum Data 



Method 


Model 


BIG 


Demodulated 


GARMA(0,0) 
GARMA(1,0) 
GARMA(0,1) 
GARMA(1,1) 


71.246 
32.211 
27.153 
30.921 


Standard Whittle 


GARMA(0,0) 
GARMA(1,0) 
GARMA(0,1) 
GARMA(1,1) 


73.330 
42.794 
38.905 
42.021 



Table 6: U.S. Petroleum data - GARMA(0,1) parameter estimates 















1 X 10-2 


? 


9 




Estimate 


1.918 


0.295 


-0.517 


0.372 


Approx 95 % GI 


(1.762,1.956) 


(0.221,0.384) 


(-0.675,-0.360) 


(0.337,0.412) 



Tal)l(' 7: BIC \'aluos for FarMlk)u dala. 



Method 


Model 


N = M4: 

BIG 


N = 440 
BIG 


Demodulated 


GARMA(0,0) 
G ARM A (1,0) 
GARMA(0,1) 
GARMA(1,1) 


274.918 
257.080 

266.528 
262.474 


272.710 
254.612 

262.589 
259.754 


Standard 


GARMA(0,0) 
G ARM A (1,0) 
GARMA(0,1) 
GARMA(1,1) 


278.567 
243.009 
265.769 
246.742 


281.290 
260.035 
274.484 
265.129 



Table 8: Farallon data - GARMA(1,0) parameter estimates. 





^ X 10-2 ? $ 


Demodulated N = 444 Estimate 

95 % GI 

N = 440 Estimate 
95 % GI 


8.358 0.221 0.628 0.431 
(8.206,8.438) (0.157,0.314) (0.558,0.726) (0.266,0.562) 

8.295 0.234 0.644 0.401 
(8.290,8.391) (0.156,0.311) (0.558,0.728) (0.252,0.556) 


Standard N = 440 Estimate 

95 % GI 


8.409 0.156 0.629 0.520 
(8.222,8.497) (0.133,0.305) (0.562,0.736) (0.286,0.594) 



Table 9: SOI dafa - GARMA(O.O) paramof(T esfimalos 





e X 10-2 S a\ 


Demodulated Estimate 

Approx 95 % GI 


2.366 0.237 0.782 
(1.452,2.399) (0.215,0.254) (0.728,0.833) 


Standard Estimate 

Approx 95 % GI 


2.247 0.235 0.778 
(1.402,2.381) (0.215,0.255) (0.730,0.833) 
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(a) (b) 




-2 2 -2 2 



(a) 5 = 0.40 (b) S = 0.45 

Figure 2: Simulated Data: The finite N approximation to the distribution of — for 
(a) 5 = 0.4 and (b) 5 = 0.45. The dotted and dash-dotted curves give the proposed finite 
large sample approximation for different values of N whilst the solid line gives the Cauchy 
asymptotic form. It is clear from the plot that for large values of 6 the distribution is quite 
slow converge to the Cauchy. 
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(b) Fit of two GARMA models 



Figure 3: U.S. Petroleum Data: Raw data and spectral fits of GARMA(0,0) and GARMA(0,1) 
models. The GARMA(0,1) model yields a lower BIG value. 
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1960 1970 1980 1990 

Date 

(a) Raw Data 




(b) Fit of two GARMA models 



Figure 4: Farallon data: Raw data and spectral fits of of GARMA(0,0) and GARMA(1,0) 
models to the data set with A'' = 440 observations. The fit of the models using the Whittle 
hkehhood are similar, but inferior in BIG terms. 
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1920 1940 
Date 

(a) Raw Data 



— standard 
Demodulated 

— Maximum Ordinate Estimator 




0.00 0.02 0.04 0.06 0.08 0.10 0.12 

i. 

(b) Fit of GARMA models under standard and demodulated approaches 

Figure 5: Southern Oseillation Index data: Raw data and spectral fits of GARMA(0,0) model 
under standard Whittle and demodulation. For comparison, the estimator that takes the 
maximum periodogram ordinate as the estimate is also displayed. 
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A Appendix: Proofs 



A.l Expectation of the Periodogram at the Pole 

Starting with the same method of calculation as in Olhede et al. (2004, p. 623) we find a large N 
approximation to the expected value of the periodogram at ^, after demodulation via ^. We have 

[N^'PiOi N^'P{OJ,-J^ lA-ePTVsin^MA-e)} + ^^'^ 



1 sin'iTTu) , ,,,2 /-^sin^ TTu) , 

/ I ,n\J du + o 1 = / „ .\ „ ^ du + o 1 



= -T{-l~~2S)cos{Tr{l/2 + S)}2^^+^7:^^-^ +o{l)^B^{^,S)+o{l) (A-1) 

This result implicitly defines B^{^,S). The final line follows from Gradshteyn et al. (1994, §3.823). 
Prom equation (3), 

^ {47r|sin(27rO|}'' °' 

as A — > ^. Thus after demodulation, the expectation at the singularity is given by equation (A-1): 

^ , „.2r(-l-25)cos|7r(i + ^)|o-2|/i(^;6>)|^ , 

TT |2 |sm(27r4)|} 

A. 2 Bounding the Covariance Contributions 

Under Gaussianity of the original time series, the DDFT will also be jointly proper complex Gaussian, 
thus we only need only to approximate for large A'' the first and second order joint properties of these 
variables; the zcroth order properties are given in Appendix A.l, in conjunction with the results in 
Olhede et al. (2004). 

We consider the discontimiitics of the likelihood of the DDFT coefficients explicitly, and also the 
effects of ignoring the weak correlation between the Fourier coefficients near the pole (see Robinson 
(1995)). It is easier to deal with the demodulated sequence only, and so we shall only evaluate the 
frequency domain quantities at frequencies \j from (14). Let Ij = /o(Aj), and take Aj = Ao{Xj) and 
Bj = i?o(Aj). As we only consider demodulation by Ad we in this section suppress the subscript D. 
We note that with i' = i/2 for i even and i' = (i— l)/2 for i odd, and similarly for j, then 



iSA.,.iv(?,<5)/(A,0 i = j 

(i - j) mod 2 = 1 



,/,iv {^^,S) v'/(Ai')/(A/) + o(l)V/(A,')/(Ai') (i-j) mod2 = 0, z^j 



Let = diag(y^ii?A,,,iv(^,<5)/(AjJ ... ^J ^B^,^,M{^,6)f{Xj,)), and let tc, = D^'ScD^'. 
Twice the log-likelihood based on the sample C\ takes the form: 

2e^^\^,6,e,a^,) = -Ariog(27r)-log|ScJ-CjS^;^CA + o(Ar) 

= -AT log(27r) - log \Dl\ - ClDp'Cx + R{e, ^, 5, al) + o{N) 

= 2ei^,S,e,al) + R{e,tS,al)+o{N), (A-2) 
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where 



R{e,^,6,al) = -log\tc, 



Note that 2£{C,5,e,a'^J = 0{N). Also, logjSc;,! = log{0(l)}. The latter statement holds as the 
magnitude of this object can be bounded by considering the trace of the matrix S^^, and the fact 
that for log(A'') < k < j the covariance terms can be bounded by log(i) (c/ Robinson (1995)). If, 
for Iog(7V) < k < j, wc consider the terms in the log-likelihood involving AjAk and BjBk, then these 
are Oj/c^^ log^(j)}. The higher order terms are obtained by inverting the covariance matrix, and the 
second term coming directly from the order of the contributions. We write E{AjAk} = E{BjBk} = 
Tjkk~'^ log^(j), for Tjk = 0(1) and let T = maxj maxfe \Tjk\. 

When summing the covariance terms we need to split up the terms indexed by negative and positive 
j into two sum. Consider one of the two sums, and sum the contributions over indices log(A^) < k < 
j < J = 0{N), denoting the sum i?2. To formally derive this for contributions to the left and right of 
the pole, we can use twice this term, and the order of the contributions are the most important result. 
Then we note that using Minkowski inequality arguments: 



1 



\Ri 



< 



1 

N 



j2 ^ ^ 



fc2 



E 



log'(j) 



E 



Ar2 ^ N 

j=log(JV) fe=log(JV) 



j=log(iV) fc=log(JV) 

.J/N f 

{log' {x) + 2 log(a;) log(^) + log' (iV) } 

-'^ Jlog{N)/N 
1 



1 



log(7V) 
"log^(a;) 



log(A'') X _ 

[x{log' X - 2 log(a;) + 1} + 2a;{log(a;) - 1} log(iV) + \og\N)x] ^^^^^^^^ 



dx + o(l) 

J/N 



-, J/N 



+ \og{N) log' (a;) + log(x) log'(Af) 



0(1). 



log(Af)/Ar 

Note that AjB^ is for any choice of j and k, o(l). Thus as {2£^j^\^,d,6,a'^))/N is 0(1) we can 
ignore the covariance contributions. Asymptotically, using the likelihood from equation (16) yields 
equivalent results to using the likelihood constructed from independent exponential random variables 
with non-equal variances, due to the weak correlation between the Fourier coefficients. 



A. 3 Additional Notation 



Define i/) = (^, 6)^ , and denote the true values of the parameters by t/'*- We suppress the dependence 
on other parameters, i.e. the dependence on 6 and cr'. Consider first expansions of the log-likelihood 
defined by equation (16), Let 



' is,i{^) £5,5 W 



denote the matrix of second partial derivatives. Furthermore, it is convenient to introduce additional 
random variables, required to study the properties of the score and the observed Fisher information. 



We denote by Ij and I^j the quantities Io{Xj) and loi^j) respectively, and hyTj^'^\ jif'^) andX^ 
the standardized pcriodogram, derivative of the pcriodogram wrt the ^ and the second derivative of 
the pcriodogram wrt to the ^, all evaluated on the shifted grid. Then 



r(/.JV) 



MM 

/(A,)' 

/o(Ao) 



f{f,N) 



^(A,-) 

/o(Ao) 



-^3 



^o(Ai) 

/o(Ao) 



B4(C,(5)A^25+2jt 



0. 
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(A-3) 



These quantities can be written in terms of the real and imaginary part of the DDFT and its derivatives, 
and so we define for j = Ji, . . . , J2 : 

Aj = ^T.tXt cos {27r(C + jlN)t} Bj = X, sin {27r(^ + j/N)t} 

Cj = ^ cos {27r(e + j/N)t} D, = ^ sin {Mi + ]/N)t} 

Ej = ^ Et t'Xt cos {27r(^ + j/N)t} = ^ Et i'^t sin {27r(C + j/N)t} 

Gj = ^^t t'Xt cos {27r(^ + j/N)t} = ^ E* t'X, sin {27r(^ + j/N)t} , 

for j = Ji, . . . , J2, where the sum over t ranges over t = 0, . . . ,N — 1. Also, let 



be the corresponding suitably standardized quantities. We shall also derive expressions for the expec- 
tation of j^-^'^^ and xj-^'^-* and these will be denoted B\. jsi, B\-.n, and B\. jsi, respectively. 
Their variances take quite complicated forms, and we denote the theoretical constants that give their 
forms for X^^'^^ and Xj^'^^ via C\.,n, G^_' ^ and C^^j^, where the first of these terms is a rough 

approximation to the variance of X^-^'^^ . More details follow later in the text when appropriate. Fur- 
thermore, the covariances of the jth and kth DDFT coefficients and their derivatives, are denoted by 
Vx^,Xk,N, Vxj,Xk,N and Wx^,Xk,N respectively. 



A. 4 Zeroth Order Properties 



To acknowledge the dependence of the likelihood on the indices Ji = —jo,N{£,) + 1, and J2 = M — 1 — 
Jo,Af(C)i and the fact that these indices depend on ^, we thus in this section write explicitly £ {xp, Ji, J2). 
Note that Ji < 0. For any finite value of this dependence introduces a discontinuity in the log- 
likelihood in the form of a jump when the demodulation makes the range to the left decrease by one, 
and the range on the right increase by one, or vice-versa. This fact is inconvenient for our calculations, 
as it makes the log-likelihood discontinuous and hence not differentiable. However, it transpires that 
the magnitude of the discontinuities are of an order that can be ignored for large sample sizes, as will 
be shown by the first proposition, so that subsequent calculations will be in terms of i{'ip), where Ji 
and J2 are treated as fixed with respect to ^ and of order 0{N). 



Proposition 1 Consider the log-likelihood at ^ = ^' + A, and assume that 7^ 0, 1/2. Without loss 
of generality, assume that J2(£,' + A) = J2{£,') + 1^ •'■'o that Ji(S,' + A) = Ji{£,') + 1- J^st 

An = i{i' + A, S, Ji + 1, J2 + 1) - £ (C' + A, 6, Ji, J2) 

be the magnitude of the discontinuity introduced by perturbing ^' . Then 

E[An] = 0{1) var[Aw] = 0(l) 

and for every e > 

P(Ar-i |Ajv| > e) ^ as 00. 



Proof: (Sketch) It is straightforward to show that the discontinuities, Ajv, in the likelihood are 
random quantities with mean and variance that are 0(1), so after standardization it follows from the 

weak law of large numbers that — > and the result follows. Full details are omitted. ■ 
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This difference between the log-likelihoods at different values of £, that induce a change of the grid 
is 0(1). We can therefore apply arguments such as those developed by Coursol and Dacunha-Castelle 
(1982), to justify the usage of a form of the likehhood which ignores the the jump in the indices, when 
deriving the properties of using a form of the likelihood that does experience discontinuities as the 
value of ^ alters. We may from the above calculations note that for large samples it is equivalent 
to use ^ (^ + A, (5, Ji, J2) or + A, 5) in the analysis of the data; see also detailed discussion by 
Dzhamparidzc and Yaglom (1983). For our weak convergence result, we standardize the log-likclihood 
by a factor of A'^"^ as the log-likelihood terms are both 0{N). The log-likelihoods are constructed from 
a data-sample of size N, and so we can ignore any contribution of order 0(1). Then i{£,,S) /N will 
be 0(1), and wc shall discuss limits of properties expressed in terms of this standardized quantity. 
Finally, informally, whilst any individual term is contributing 0(1) to the likelihood, on differentiating 
the log- likelihood, this is no longer true - the individual contributions to the score in ^ will be 0{N) 
near the pole, and 0(1) away from the pole. This effect renders the discontinuities even of lesser 
importance. Note that we can establish a large sample approximation to the distribution of the 
standardized log likelihood. We approximate the sum by an integral and as the correlation between 
the Fourier coefficients is sufficiently weak we have 



These results for the entire log likelihood at any fixed value of the parameters agree with standard 
likelihood theory. We shall see that the behaviour of the pole is such that subsequently no result for 
the estimation of the pole follows as standard likelihood theory would make us anticipate. However, 
with a suitable standardization, the properties of the MLEs and the likelihood are still tractable. 



A. 5 Existence and Consistency Proof 

The existence of the ML estimators is guaranteed as it is easy to show that the log-likelihood is 
everywhere bounded on the parameter space. The proof of consistency proceeds very similarly to 
Giraitis et al. (2001), who assume that the maximisation over ^ is over a grid of frequencies, where 
is each grid-point is spaced 0(iV^^) apart. This is a sensible choice as the estimation is most often 
carried out over the Fourier frequency grid via the DFT. Define 



and note that this constrains log(/f;(A: ^, (5)) to integrate to zero. Giraitis et al. (2001) show strong 
convergence of the estimated location of the singularity to the point on the grid closest to the true 
value of the pole, using the likelihood defined by equation (8). 

The likelihood approximation defined in Theorem 1 cannot be treated identically to the function 
of ^ defined in (8), as the Fourier transform in the former likelihood is calculated at a different set 
of frequencies whenever a different value of ^ is picked. However to compare the magnitude of the 
log-likelihood at ^ and at ^* we need to compare likelihood based on different Fourier grids. This may 
seem problematic, but recall that the DDFT is a linear orthogonal transform, and so both likelihoods 
may be directly related to the likelihood of the time domain sample whatever grid is used. It is hence 
suitable to compare the magnitude of the likelihood of the DDFT at different grids. To be able to do 
this, we introduce some extra notation. Recall the demodulated grid Aj(^) = ^ + j /N, j = Ji . . . , J2. 
First, define jp ~ jp.Ni^j ?*) — argmin^gz |^* — Xj (^)| . Thus at any value of N, when the true value of 
the pole is but the likelihood is evaluated at a grid evenly spaced around ^ : jp is then the index of 
the frequency on the grid demodulated by ^ that is closest to Thus \jp — N^\ < 1/2, and we define 
jp uniquely by taking the least of possible values is the pole is evenly spaced between two demodulated 
Fourier frequencies. Similarly define kq = Ao(^) = S, and Kj^ = = S, + jp/N. Thus kq is the 

demodulated Fourier frequency corresponding to ^ whilst kj^ is the demodulated Fourier frequency 
closest to Note that using the triangle inequality 




/G(A;(5,0=^r'/G(A;e,5), 



N\^-e\<N\^- Kj, \+N\Kj,-e\<N\^- Kj, I + 1/2. 



(A-4) 
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This allows us to consider the properties of the log-likelihood at the same grid explicitly, as P [N |^ — Kjj,,Ar(0 1 ^ ^) ^ 
P (A'' 1^ — > K + 1/2). If we establish the result for |^ — kj^] , we can redefine K to derive the 
same result for iV |^ — . In the vein of Giraitis et al. (2001), to show consistency, we fix e and consider 
choosing K such that 

p(n\5-5*\'^ > +P(^N\^-C\ >iK + l)^ <€ (A-5) 

Let 

UN (t/^) =N\6-S*f + \N^ - jp,Ni^,C)\ ■ 
We may obtain a bound for (A-5), 2P {un (^) > K), by considering 

P [n\S-S*\^ > k) +p(N\l-Kj^,N{le)\ >K)<e. (A-6) 
Define O {K) , a subset of the parameter space (^, 5), defined for each fixed constant K, by 

n{K) = {,!,■ (0,1/2), 5 e (0,1/2), un{iI>)>k}. 

Let ■0* = {I'^joT^*)- Analogous to Giraitis et al., we bound (A-6) by 



P I inf 



lw*)-£(V)} 



< 1 = P I inf 



/uN{tp) < (A-7) 



Note that the constant B^{^,d) (see equation (A-1)) does not explicitly depend on iV or ^ (although 
the bias is computed at a fixed ^). Also denote the Kronecker-delta by 6ij as usual. Consider first 



Io{ko) -^0(^0) 



(^Vg (ko; S, ko) B^,N{t S)N^^p {ko, 6, kq) 



Un + Tn + Sj^,Q - 1 - Io{ko)Wi - IoiKjo)iW2 - W3), 



where 



B^{^,6)N^^fHKo,S,Ko) B^*{e,5*)N^^yKKj„6\Kj,) a^/c («,„; 5, «o) 

where in (1) we have defined Un and as in Giraitis et al. (2001). We can bound the probability in 
(A-7), in a similar fashion: 

P sup |w^^f/Ar|+ sup \ujf^ {1 + Io{Kja)W3}\ 

+ — -I- sup \u~^^ Io{ko)Wi\ + sup \u^^Io{K,jjW2\ > inf U^j^^Tjvl ) 
ij>en(K) il>en(K) ■<t>en{K) j 

Most terms are the same as in Giraitis et al. (2001), and bound in an identical fashion, apart from 
\u^h{KQ)W\\ and \u^h{Kj^)'W'2 \ ■ Clearly 

eJ sup |u^i/o(«o)M^i| i =C2i^"' and eJ sup |u^%(k,JW^2| i = Csi^"'. 
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Hence the result follows, see Theorem 3.1 in Giraitis et al. (2001). The proof follows Giraitis et al.'s 
and thus for a fixed grid with even spacing from ^, shows that the maximiser in terms of ^ of £(■»/') 
becomes close to the point on the grid closest to which by the properties of the grid has to be at 
most 1/(2 AT) from This obviously is not the distance between the maximiser and ^* but can be used 
to show convergence in probability. This strategy lets us avoid dealing with problems in the singularity 
of the likelihood, as well as the local periodic ripples. 



A. 6 First Order Properties of Derivatives of the Likelihood 

Proposition 2 For an SPP with parameters tp* , the expectation of the score evaluated at the 
tj: = tp* is zero, that is E |i(V'*)} = o{N). 

Proof: To deal with the statistical properties, we first note the expectation of the standardized 
periodogram as given in Olhede et al. (2004), and Lemma 1 in this proof, so that 

E {jf } = Bx,M^, 6) + 0(1) = 1 + (log(j)/i) + 0(1), 

by results derived from Robinson (1995). For large N, the second order properties of the score is 
dominated by the T^^^ terms, that are distributed like quadratic forms of correlated normal random 
variables. We start by deriving the expectation of Ij in terms of the trigonometrical forms defined in 
equation (A-3). We find that: 



E{ij} = 47rE{B,C,-AjDj} = -25Nf{X,) 





u 






/ — oo 


j 



Mj - u)} 

= Nf{Xj)Bx,M^,^)+o{N^'), (A-8) 

this defining Bxj^N{£,,S). From this expression it is obvious that Bx^^]^{^,d) = —Bx_j^N{£,,6). For 
large j we have that Bxj,n{^,S) = 0{j~^}, where to derive this result, consider the decomposition 

3+e 



= / +/ +/ +/. +/. 



As in Robinson (1995) we bound the individual contributions of these integrals. Using identical argu- 
ments, we find for j = 0, 

E{io} = 4nE{BoCo-AoDo} = -{2S)N'+''P{0 T ^'^f^ du + o{N^') = o{N^') 

J -oo W [TTU) 

as the integral is zero. Thus when ^ = 

E{j(^'-)}=0 E{jf-)}=ij.^,.(^,<5)+o(l), 

where Bxj,N{^,S) = 0[j~^), and furthermore note that Bxj,n{^,S) ~ —B^Xj,N{^,S). Recall the 
definition of rij from equation (17) in Theorem 1; locally rij = r]_y We also define for j = Ji, . . . , J2 



i?f^ = ^log(r?,.) = -21og 



38 



We may then write 



.J2 



j=Ji 
J2 



^ [5W{i-jf^)}-iv±F'^) 



+ o{N), 



3 = .Jl 

J2 

E {l^ = J2 (log(i)/i) - NBx.M^, ^)) + = o{N), 

j=Ji 
.h 

E{4(V*}} = X]i??^0(log(i)/i)+o(7V)=o(7V). 



Thus £{£5 (V'*}} is o(A'^). This characterizes the first order properties of the score functions. 



A. 7 Second Order Properties of Derivatives of the Likelihood 

The following result enables us to determine the properties of the Fisher information. We shall discover 
that the observed Fisher information does not converge to a point mass, and so far from standard theory 
ensues. 



Proposition 3 For an SPP with parameters tp = tp*, the Fisher information evaluated at tp* is 
given by 



E{F^m} 



where ^^^^ is the expected value of the negative of the second derivative of the log likelihood taken 
with respect to ^, and 

(N) 



with J^g^s and J^^'^j similarly defined. 

Proof: Consider the expectation of the second derivative of the periodogram; we must calculate 

E{ioj} = Stt^E {C] + D] - {A,E, + B,F,}) . 

First, consider Uj = Cj — iDj, and the standardized version t/j'^'^'' = Cj'^''^^ — iD^'^'^'^K After some 
algebra, suitable standardization, and integrating by parts on — l/-\/iV, ^ + 1/VN), with change of 
variable to u where Ci = C + u/N, we have 



fk 



ip{j, k, u)du + 0(1) 



where 



,,. , . , sin |7r(u — 7')} sin |7r(u — „. sin |7r(u — A;)} 

^{j,k,u) = -K^ , \ ^ — -J^-2i-K- ^ ^ ^' 



{■K{u-j)} {7r(w-fc)} 



■k{u — k) 



TTCOS {'k{u — j)} 



{■k{u - j)} 



sin{7r(M- j)} 
7r2{7r(u - 



TT COS {7r(u — j)} sin{7r(u — j} 



{-K{u-j)} 7r2{7r(u - 



TT COS {7r(M — A;)} sin{7r(?i — fc} 



{'k{u — k)} 7r2{7r(w — A:)}^ 
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The calculations are very much in the spirit of Olhede et al. (2004). After some algebra, we have 

E {c/f '^)c/(/'^)*} = ^if,, + 0(1) = ^ {27r2F.^,,,,^(e, 6) + H^A,.A„Ar(?, 6)} + o(l), 



where Vx,,x,,,n{^,S) is defined in Section 2 and we define 



where 



/.tx 
J — o 

d sin{7r(u — j)} 



u 
jk 



Ca,,a.(w) 



Cx^,x^{u) du, 

d sin{7r(M — fc)} 
du 7r(u — k) 



(A-9) 



3 ^ 



= o(l). Hence 



du ■k{u — j) 

Similarly, after standardization, and some algebra, we obtain E 

E {U^Uk*} = E {CjCk} + E {DjDk} + i {E {CjDk} - E {CkDj}} = N^^ f (Xj) f {Xk)-^Kj,k, 
E{UjUk} = E {CjCk} - E {DjDk} + i {E {CjDk} + E {CkDj}} = o!^N^^f{Xj) f (Afc)| . 
Thus, for large A*", 

E{CjCk}=E{DjDk} = ^Ay/ (A,) / (Afe) (5R (Kj^k) + o(l)) 

= Y^A^'/nA^mAl) {27rVA^,A„7v(e, S) + Wx,,x„n{^, S) + o(l)} 
E{CM = -E{CkDj} = ^A^/ (A,) / (Afe) {3 (A,- ^ + o(l)} 
and E{Cj£>j} = o(A^). Using similar calculations, we have that 



E - B,F,} = — <J -27r^i?^;;,(^, S) + B^^^'^i^ S) - C^^^y 



where 



" 87r2 { 

(A -1)2 /•1/2 



A 



fin) 



sin^ {A7r(^ + j7A-M)} 



du 



-il 

2 r^/^ 



1/2 sin^{7r(e + i/A-M)} 

5/(u)sin{A7r($+j/A-u)} 5 sin{A7r($ + j/A - u)} 
1/2 du sin{7r(^ + j'/A — u)} 9u sin{7r(^ + j'/A — «)} 



a sin{A7r(^+j/A-u)} 
9u sin {7r(^ + j/N — u)} 



du. 



Case 1 [j ^ 0] For large A that looking at the components of this expectation, and standardizing 
via /(Aj) A2 that 



B\Jn{^,5) 1 (7V-1)2 /-i/s sin2{A7r(e + j7A-w)} ^ ^ 

/(u) ■ 2\ L rT^rfw = BA,,jv(?,(5)+o(l) 



/(Ai)A2 - /(A,)A2 A ^ sin2{7r(C+j/A-u)} 

This follows directly from OMS. Note that (see Robinson (1995)): 

Bx.Mj, S) = 1 + 0(log(j)/i) + 0(1). (A-10) 
Consider the change of variables from ^ to u, where ^ = ^ + u/N. Then, after some algebra 

1 ^i^]v(^.<5) _ 1 1 f^/^ dy{u)sm^NTT{^+j/N-u)} 



1_ d'f{u)sm^{N7T{^+j/N-u)} 
Vn^J_U2 du^ sm' Me + j/N-u)} "'"^'^^ ^ 



87r2 /(A,)A2 /(A,)A2 23A7r2 7-1/2 du'^ sin^ {niC + j/N - u)} 
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and recalling that 
dv? 



we have 



1 Bi%{^,5) 



E 



87r2 /(A,)Ar2 



25(2^+1) 

^3 



^ rfu + o(l) = + 0(1). 



{7r(j -u)} 



2^2 



Note that the latter integral converges. Note that for j large 

s + j 



26(26 + 1 



oo j2 



{s + jf 



-2(5 . 2/ \ 
Sm (TTS) 

(7rs)2 



ds = o(r') 



which decays (Gradshtcyn ct al., 1994, §3.821(9)) with increasing j. The derivation of this result 
resembles that of Bxj^n{£,,6) but the integration over s = needs direct appeal to mutatis mutandis 
of the calculations in Robinson (1995), after the term has been taken outside the integration. 
Similarly 



87r2 f{Xj)N^ 



f 

J — ( 



"^■^ [sin {7r(j - u)} - cos {7r(j - u)} 7r(j - u)f 



4{7r(j-n)}^ 



+ o(l) 



= ^CA,.Ar(e,5)+o(l) 



We note that Ca ,Ar(^,(5) = W6v ,Aj,Af(Ci i^)- Note that for j large, mutatis mutandis results from 
Robinson (1995) bounding the Dirichlet kernel, for the expectation of the periodogram (up to terms 
0(1)): 



CA,,jv(€,^) = 27r I' 

J — C 



s+j 



jsin (s) — cos (s) s}^ , 2'!t'^ ^ 

i ^-J- — - — ^ ^ ^ ds = — — + O 

3 



log(j) 



27r2 



+ 



log(j) 



which tends to a constant for increasing j. We then have that 
1 „ r V 1 87r2 



(A-11) 



This gives us 



E 



E {D| - AjEj - BjFj + C]} = Ba,,jv(^, S) + o(l). 



E 



{ioj} = I (A,) N^Bx^,n{^, 6) + o{N^). 



(A-12) 



Case 2 [j = 0] For large N considering the components of this expectation, and standardizing via 
/t (^) Ar2'5+2 it follows that 

B^SiU) 1 [N-lf f'/' , sin^{jV7r(^-»)} , r,, ..^^^ 

FlOiv^ = F(0 y-V2 ^^"^ sin^Mc-.)} = + 

This follows directly from Appendix A, including the definition of B^{^,5). Again, using the change 
of variables ^ = ^ + u/N, and a similar series of calculations, 

1 B^i^N{^,S) ^ -26— [°° luf"^^ {u)~^ I . sin (ttm) cos (tth) 



SB5,Ar(^,5)%iv(C,<5)+o(l). 



87r2 
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The B^^]\f{^, 6) term has been added to simphfy subsequent calculations. The integral converges (to 
see this Taylor expansion of the integrand at u = 0). Finally, 

1 C^'^Ni^^S) r°° 25 {-sin (ttu) + cos (ttu) TTu}^ 1 • ,c x\ , ^^\ 



The latter integral also clearly converges. Thus 



^ { Joo} = THT&^E [Dl - AoEo - BoFo + C^} = B^,n{^, 5)B^M^, 5) + 



which yields 

E {/oo} = hNi^, S)B^,Ni^, 6)f (0 iV2+2^ + o(iV2+2^). (A-13) 
These results enable us to determine the properties of the Fisher Information. 



A. 7.1 Asymptotic Properties of the Observed Information Matrix 

We define 

,(2) _ ^l 5(2) 1 



so that 



Additionally with 



5 = E{«f-^fV4- 



3 



T, 

then we find that 



N d ( 1 1 \ 



Finally with 



we have that: 



Then it transpires 



1"^ 'i'2 C '\ ^ 

^ + E §1 + '5) + E I -2^^^A,,iv(^, 5fy+ N^B.^M^, I + o{N). 

To i^o Ji 1^0 I Jj J 
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We can then note that for N large this sum will be dominated by: 



Note that Ji/N = + o(l) and J2/N = i - ^ + o(l). 



4 1 B,M^,S) 



21og(Ar) 



/o J j^o 





N 


1 (^2 log 






j 



Note that for large values of j, the two peaks of equation (A-8) separate, and the peak around zero is 
actually a symmetric peak and trough, and we have that I3xj,n{^,6) = 0(j~^), whilst I3x.^N{i,5) = 
-Bx_j,n{^,S). If /t(A) admits the representation /t(A) = do{ip) + di{ip){X - + d2{ip){X - C)^ + 
0((A — ^)^). After some algebra 



(JV) 



21og(iV)^+o{log(iV)} 
do 



J2 



do 
2 log 



di,f 
do 



do,^di 



dl 



do 



di,s do,sd\ 
do 



dl 



J_ 
N 

qodo,s 



r(l); 



= 2 log(iV)go + o {log(7V)} - 4Nqo^ log \^\ + = -^f ,5 ^ + o{log N}, 

where qo = do^^/do is a suitable constant. Finally tedious, but trivial calculations, based on the Taylor 
expansion of the function /^(^) yield 



r2 



J2 

■ E 



4 log 



^+41og^ 



■o{N) 



(A-14) 



4. 
2d2 



TV + 27V 



-4 {(log - 1) ^} ^ + 4 (log2 1^1 - 2 log 1^1 + 2) } 



-o(iV) 



= J^s,sN + o{N). 
This proves the large sample properties of the Fisher information matrix. 



A. 8 Asymptotic Distributions 



A. 8.1 Distributions of Standardized Scores 



Score in 6: To determine the properties of the MLEs we need to establish the joint distribution of i 
and Wjv, defined in equations (18) and (19). We commence by discussing the first of these quantities. 
A usual central limit theorem will apply for Is {ip*) , and we already noted that E {Is {ip*)) = 0. 
Furthermore: 

var {Is ii^*)} = E <''^A,,^(^' ^) + o(^) = + 
j=Ji 
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and note that J^g^^ = J^s,sN + o{N). We may make the following note and definition: 

Also we noted in the previous section that 

E{-4,4 =^i? = 0(iV), var{-4,^} = J2R?^'bI,n{^^^)+°W = 



NZi+o{VN), Zi^ NiO^J's^s) 

-1/2 „ , C 



(A-15) 



Thus 



TJ/ ^s,s P . 



and so wc may note that as fcjv,2 and Wjv,22 are asymptotically uncorrelated and Gaussian, we find 
that using Slutsky's theorem 



= fciV,2(T/'*)[WjV,22]"' ^^(0,1), 



(A-16) 



and from this result we can deduce Theorem 5. The value of ^j^'' and J-s,sN are given by equations 
(A-14) and (A-15), respectively. 



Score in ^: If the likelihood were sufficiently regular, then the arguments that we used to derive the 
distribution of y^Ts^sN (^S ~ S*^ could be replicated for ^ instead of 5, and the large sample theory 

would be relative straightforward. However, this is not the case, and we find that for the parameter 
^, the situation is more complicated. The first observation of interest is that we may note that the 
score is dominated by the derivative of the demodulated periodogram, i.e. Tj'^'^K In fact, with an 
appropriate standardization of the score we determine that 



1 

Vn 



J2 

E 



{sf{ 



1 - 'h^j 



)-'>,±r'} 



1 

ri/2 



j=Jl 



(A-17) 



where the sum random variable converges in distribution. To be able to determine the large sam- 
ple properties of this object, we thus need to derive the joint distribution of the random variables 
|2-j/,iV)| j{i,N) ^ quadratic form in correlated Gaussian random variables A^-^''^\ B^^''^\ Cj^'^^ 

and Dj^'^\ that make up the standardized derivative of the periodogram. Their joint distribution can 
be determined from their covariance. 



Proposition 4 



cov ■ 



(A-18) 

up to order o(l), where Vx^^x^,^n (C)^) defined in section 2, W\^^\^^j^ {^^^) defined by eqn (A-9) 
and FA,,Afc,Jv(C,'^) is given by 



/OO 
-00 

Note that if log(iV) <k < j then 



s 

jk 



k 



sin{7r(j — s)} sin{7r(fc — s)} 
n'^{j - s){k- s) 

' log(j) 



ds. 



fc2 
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Proof: Define Vj = (Aj, Bj,Cj,Djf and v}^'^'^ = {^^''^^ b'^/''^^ cf'^^ D^'^' } and note that 

its components are correlated normal random variables. Note that E {Vj} = 0. We shall derive the final 
calculations needed to complete the entries of the covariancc matrix of this object, namely cav{Aj, Cj}, 
coy{Bj, Dj}, coy{Bj,Cj}, and cov{Aj, Dj}. As above, integrating in the region ± N~^^^ after 
change of variable to u where ^ = + find that the suitably standardized random variates have 

expectation: 



^(/,JV)_p(/,W) 



}4/ 





i 


' — oo 


u 



2S 



sin^ [7r(j - u)] 



d« + o(l) = BA,,iv(?,5)/4 + 0(l) 



and where the terms including the derivative of Fcjcr's kernel cancel after a change of variable u — » —u. 
Also we can note from our calculations of the first differential that 



E{AjDj} 



-l/2Nf{Xj)Bj,nM^,6)/{4n)+o{N)f{Xj), 



as the cross-terms contribute terms of lesser order of magnitude for large N, and with a change of 
variable ^ ^ —S, the terms multiplied by A'^ — 1 cancel. We also note that 

E{BjCj} = -E{AjDj} = l/2NfiXj)Bx,M^,6)/iA7r) + o{N)f{Xj), 

this result characterizing the second order structure of the derivative of the periodogram. This, in 
combination with OMS and previously derived results yields (up to terms o(l)/ (Aj)): 



var(V,) 



/(Ai) 











T^JV^3*(^.-..)/ 



(A-19) 



which we shall denote Clj. Note that Kjj — 2it'^ Bx ^H, 5) + C\.n{£,, S). We are also interested in the 



covariance between the terms ij/'^^ , and thus need to calculate 



cov • 



cov 



k I 



-COV { a/'^^D;/'^), Bi/'^)C7(^'^) } + cov l^/'^)!)/'^) , ^i^'^^i?!^'^) } 
Using Isserlis's theorem (see Isserlis (1918)) for zero-mean Gaussian variates we note that 
E {X1Y1X2Y2} = E {X1X2} E {Y1Y2} + E {X1Y2} E {X2Y1} + E {X1X2} E {Y1Y2} . 
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Hence we find that cov , ij/'^^ | is equal to 



+E {^f ^^4/'^) } E ^^^^'^^ } + E { ^'^^'^^ } E {Df 



,1 1' r 1 



-i'^^y i iVx.M^N its) f - 0(1) - (47r)^ I -^Vx,,x,,N i^,S) 



^(47r)2 / I 



167r2 



[\vx„x„N{^,S)^{2n^Vx„x,M^,S) + Wx,MM^,S)} 



= ^'lvx„x„N i^,s) + lv^^,x,M^,s) + \vx,MM^,mxjMM^,S) + oil) 
= 0{fc-2log'(j)} + 0{fc-*log'(i)} + 0{fc-2log2(j)}+o(l). (A-20) 
Note that the bound for Vxj,Xk,N follows by arguments, mutatis mutandis, Robinson (1995). ■ 

A. 8. 2 Distribution of the Derivative of the Standardized Periodogram 

We now derive the distribution of X^'^^ to be able to determine the distribution of X^'^^ : 

Proposition 5 

4 



ir^-j:ii''Rh+o{l), (A-21) 



k=l 



(i) 

where ^'^^ roots of equation 



1' 



- Bj^oM^^^h^ + [IbIj,^n{^,S) - ^Bx„n{^,^)Cx„n{^,S)^1^ + IKM^,^) 

i Bx^,N{i,S)Cx„N{^,S) 



-2-^Bl^^{i,S)j^-^ 

+ ^Bx^MtSycl^^itS) - l^Bl ,,i^,S)Bx,,Ni^,S)Cx,MC,S) + 1bI ,,{^,S) = 0, 
and Rij are independent unit Gaussian variables across i for each fixed j. This in turn implies 

E{xj^-")} = E 7i^-^ + 0(1) var{iy--)} = 2 ^ ^ + o(l). 

fe=l k=l 

Proof: Firstly note that for a fourth order polynomial with roots ^'y'j^^ | we find that 

n{7-7i^'^} = 7^-7^E7f +f E^i'M^'^-^ E 7i^M^'^7lfV7i^'^7^^MM^'^ 

k=l k=l l<k u<l<k 
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Also note that 



Thus we find that 



fe=i 



k l<k 



var{±j^'^)} 



fe=i 

2E7r 

fe=i 



(A-22) 



2(i) 



C-A^jvC^, 5)} + 0(1) = ^Bl.^j,{^, 6) + Bx„n{^, S)Cx„n{^, S) + o(l), (A-23) 

this giving the full first and second order structure of the standardized derivative, from the quadratic 
form. Equation (A-22) matches the previously developed results for the expectation of (A- 
23) gives a compact expression for the variance. Of some interest is now the difference in magnitude 
between this quantity and the jth contribution of J^^^^ /N'^, but this is not sufficient to establish the 
large sample properties of the distribution, as — does not converge in probability to a constant if 
suitably standardized. Note that Bx.^n{£^,5) nearly takes the value unity for most j, and for j small 

due to the integrand of B\.^n{^, S) being odd near the origin, clearly C\j,n{^, » 0.5i?^^^(5, 6). 

We therefore to derive a compact expression for the properties of J^-^'^^ to compare the magnitude of 

Cx-,n{£,,S) and Bx.^n{^,6) to justify this argument. 

To derive eqn (A-21) we use results given in (Johnson and Kotz, 1970, p. 149-188) on quadratic 
forms. Note that with 

/ -l/2\ 
1/2 
1/2 

\-i/2 y 



we have 
Firstly define 



rif,N)Trj.fAf,N) 



(A-24) 



var{l^.(^'^^} = + 0(1) = CjCj + o(l), (A-25) 

where O^'^^ is the normalized version of eqn (A-19) and where Cj is the lower triangular matrix given 
by, where, for notational purposes we take Bx^,n = Bxj,n{^,^) and Cxj,N = Cx^,n{^,S) : 



/ 



23 



2Ba 



2 



1 

47r 



-2Bi 



J_ 
47r 



Note that 



if.N) 



0(1), 



(A-26) 



and thus Zj = ^Vj ^ Af (0, J4) . The quadratic form is then given by (ignoring terms o(l)): 



(47r)-ij(^'^) = ^.(/'^)^ry/'^) = zJcJtjCjZj = zJMjZj, 
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and thus the distribution of this object depends wholly on the eigenvalue of Aij . Note that 
















Stt 


























0/ 



where 



Stt 



We are interested in AnAdj which has eigenvalues 7^^^^ given as the solution of 



7^ - BA,,iv(^,^)7' + \IbI,n{^'^) - pA,,iv(^,<5)CA,,;v(^,^) \ 7' + 



+ 



1 B^^^^{^,SrCl^^{^,S) 1 



1 ^, 
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We then note from Johnson and Kotz (1970, p. 151) that if wc define new variables Rj in terms of the 
orthogonal matrix of eigenvectors of Adj and Zj, they will be Rj ~ J\f (0, 14) , and 



jj/'^' = 4nZjM,Z, ^ Y.^fRl, + 0(1) 



(A-27) 



fc=i 



thus completing the proof of the proposition, and establishing the marginal distribution of T, 



Proposition 6 The standardized score function satisfies constraint: 

1 



fcjv,i(V'*) 



Ar3/2 

AA(0,a^) 



e^ = KN + 0(1) 



(A-28) 
(A-29) 



where 



i=Ji 

+ N T.{-^Km,n + -V^^,,,M^,6) + -Vx^,x,,N ($,<5) Wx,,x,,N (^,<5) | 
= ^E^A„^(C,<5) + a(l)^f . 



j=Ji 



Proof: PART I {Determining the first and second order properties of kN,i{ip*))- We note that 

= = -^^E^^^'^'Ha,) +0(1) = Y,Mr) + o{l), 
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from equation (A-17), the equation defining the random variable Yi^n{iP*). We then note from equations 
(A-8), (A-23) and (A-18) that: 



E{Y,,Niip*)} = o(l) 



GOV 



var^^if^) 



1 11^ 



N 2 



]+o{N-'] 



Thus it follows that: 



— V 



A.,iv(^,^)+2W^A„A„iv(^,<5) 



+ c 



Note that 



— V 



^A,,A„iv(C, ^) + ^A,,A„iv(C, ^) {sttVa,, 



A.,iv(C,^)+2^A,.A.,Jv(^,(5) 



equates to 



^i:C'A,,iv($,<5)+o(l) 



We arrived at this result using the order of Va^^a^.w iA-,^^-, n^^'^) ^^'^ W\j,Xk,N {£,,S) noted in 

eqn (A-20) and that: Bx^,n{C, S) = 1 + O ['-^] , Vx^,x,,n{^,S) = O {iH^} , if log(Ar) < k < j, 



BA„iv(f,<5) = O ()) , T/A,,A„Ar(C,5) = 0{i2|m| , and Cx^M^S) 
We thus have that 



O 



var{ri,jv(V''')} 



var 



r(/.JV) 



1 



27r2 AT TT^ 
^x- + o(l) = -+o(l). 



(A-30) 



Thus to obtain an 0(1) random variate we must consider a standardization of N ^/'^i^ 
PART II {Determining the asymptotic law): In outline, we note: 

'log(j) 



log' or 



+ 0(1) 



fc2 



+ o(l), log(iV) < < J. 



Now we wish to derive conditional expectations, to be able to derive the stated distributional result 
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for Z4. Define for log(Ar) <k<j< N/2 : 

fl + 0{j-Hog{j)} \ + 0{j-Hog{j)} O(r') \ 

| + o{r'iog(i)} o(r') i + o{r'iog(i)} 
i + o{r^iog(j)} o(r^) l + o{j-Hog{j)} 
V o(j-i) i + o{j-Mog(i)} i + o{rMog(i)}y 

/0{A;-ilog(i)} 0{fc-ilog(i)} 0{fc-2log(i)}\ 



0{fc-ilog(j)} 0{k-Hog{j)} 0{fc-ilog(i)} 

0{fc-ilog(j)} 0{A:-2log(j)} 0{fc-ilog(i)} 
Vo{fc-2log(j)} 0{fc-ilog(i)} 0{fc-ilog(i)}y 



Then the full covariance matrix of ^Vj^'^'^^ ^k^'^^^ given by S 
and if we define 



0(1), 



-i. = {"i^"''^ - f^irHf^/'^^V^f^i-r^}" = 0(1) + O {fc- log(fc)} + O (fc- log^(i)) , 



then 



We may thus deduce that for log(A'') <k < j < N/2 

^|^(/.iv)|^(/,iv)| ^ o(i-i)+0{r^fc-^log^(i)}+o(l) 



var{j(^'^)|i(/'^)} = ?|: + o{rMog(j)}+0{fc-2log2(j)}+o(l). 



(A-31) 

(A-32) 



These results are reminiscent of results obtained for the periodogram itself, thus using arguments in 
the vein of Hurvich et al. (1998); we argue that for j sufficiently small the sum of the terms over j are 
of negligible magnitude so that when they are standardized by N~^/'^, they decay. 



In fact if we define Uj = 



with I = 0{log(iV)} then 

\og{4>{t)) = log|£;(e*^^^^^ 



and calculate the characteristic function of Yli\j\=i denoted (j){t) 



^)|=log E^e^^'^ '"'E(^e^^-' C/,/-i...^} 



log 



1 + ?: 



o 



'2N 



3 



+ i 



k<J 
t 



logV)1 



k<J 



Jk"^ 



(A-33) 



+0 (n-^'^)) 



O 



k<j 



log'(j) 



2N 



27r2 



+E0 

k<j 



log'(j) 

fc2 



- 



it 

7n 

12ff_ 
'2 3 



k<j 

0{log(7V)} + o(ViV) 



1^2 

2 7V 



( J2 - Ji + 1 - 2Z) = - 



2iV 

177^*2 



27r2 



log^(j) 

fc2 



+ O (iV-3/2j 



( J2 - + 1 - 20 + o{N) I + O 



2 3 
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We want the characteristic function of N ^j- We spht this into two parts A'' 

and 7V~^/^ 12\j\<i ^.nd note that the latter sum converges to the point zero. Thus the sum of the 

X^^''^^ will converge to a Gaussian random variable with a zero mean and a variance of ^ : or: 

kNAr) = Kr,+o{l)^M(o,^'^y (A-34) 

In fact, stopping the argument at equation (A-33) and replacing 27r^/3 by Cxj,n{^, S), we may deduce 
that 



Kn + 0(1) AM (^0,Y^Cx,,N{^,6)j . (A-35) 



The approximation of eqn. (A-35) may serve as a better approximation to the distribution of kN^i{tp* 
rather than the distribution given in eqn (A-34), at moderate values of N. ■ 



A. 8. 3 Limit behaviour of the Fisher Information 

Having established the large sample properties of fc^r i to be able to relate them back to a suitably 
standardized version of ^ we must also establish the large sample behaviour of [WVlix near the true 
value of the pole. We shall use the same normalizations and local regions as defined by Sweeting 
(1992), when treating asymptotic ancillarity. Recall that Bn was defined in equation (19), and refer 

— 1/2 

to the notation specified in this section. To be able to do so define the Bj^ ' neighbourhoods of i/> by 

ATn {iI^*,c) = {ipen: \BM{tp- V*)l < c}. 



Proposition 7 Define (/)% = i^ip : ij, = il,* + B^^^^s, s G M^j^ ^ 



Wn{}P) W, where 

1. 

and Wii N (o, . Furthermore note the finite large sample approximation that for tp = ip* we 
have 

[W^m],, = Wr,,n + o{l), Wmm-mI ^' ^^"^ M (A-37) 

W^jv,ii ^ ^5, Z^^Affo,^), (A-38) 



?), (A-36) 



15 



where 



aji^) = var (xj^'^)) , lim a,(V'*) = 



IE' 



Proof: Distribution of [Wn (•0*)]ii • 

We seek to establish the distribution of Wjv (ip) , but intend to start by determining the distribution 

of Wat (■0*) . Most of the entries in the matrix arc easily established: we have already specified the 
distribution of [Wn ('0)]22 ^^"^ ^^^^ ^^^^ when standardized by N'^^^, converges to zero 

(see Proposition 8). This implies that three of the entries of Wjv (tp) appropriately converge, and the 
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fourth element needs to be determined, as well as note of the correlation of the four elements need to 
be considered before the limit is taken. We consider [Wjv(i/'*)]ii for large sample sizes. As 

and we note that Ij, ij and laj are quadratic forms in variables Vj = [Aj,Bj,Cj,Dj,Ej,Fj]'' , that 
are more reasonably treated in terms of the standardized forms, we can note that: 

= -4? E ^i^'"^^ + 0(1) = ^2,iv (V*) + 0(1). (A-39) 



As for large j, we note that Bx.^n{^, 5) = 0{j~^), and so we find that: 

hm VijA„Jv(C, -5)^^10 = 0(1), 

j 

and thus, 

E {F2,jv = -N-'/^E I J2 ^/'""^ I = 0{N-'/^). (A-40) 

We then consider the variance of 1^2. w {'4'*) j to determine the properties of this random variable. To find 
the full properties of 12, w (^*) we note that it is a quadratic form in the full set l^j j > and replicate 
our previous treatment of {V^} . It transpires, that the important properties to establish, for a heuristic 
argument, is the mean and variance of the random variates The variates are correlated across 

j, but given the weak correlation, this need not be accounted for, just like in the previous arguments, 
the combined correlation once suitably renormalized converges to a negligible contribution. After some 
very lengthy calculations that are not replicated here, we obtain that the variance of xj-^'^^ is given 
by: 

~a) = var{lf'^)} (A-41) 

= 2V [var{D(/'^)^} +var{A(^-^)i?y-^)} + var{i?f +var{cj^ 

-2cov •^)^ } - 2cov } + 2cov '^)^} 

+2cov{Ay'^)E(^'^\Bf - 2cov{4/'^)isf ^)^} - 2cov {fif '^^i^^^'^^, ^)^} . 

Each of these terms is given by 

var{i?y'^"')2} = var{cf ^^'} = ^ {2n'B,^M^,S) + C^^^M^^^)}' + o{l) 
^(27r2 + 27rV3)' + o(l) = ^+o(l). 



Also 

var{^f^)i;f'^)} = varjBf^V, 



+Cx,M^,S)} + ^ {-27r'SA„iv - C^A„jv(C,<5) +ijA„iv(C,<5)}%o(l) 



2^ + 3^+°W' 
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where Cx^^n{^,S) is given by 

1 / B,^.Ni^,S)-2r_ 



° \l 

oo I u 



2(5 



16 1 Bo,dM^^S)-2J: 



-2(5 sin(7ru) 



2(0, u) du + J^^\u\ iIjI{0,u) du if j = 



(A-42) 



and ■i/'2(j) = 2 [— cos{7r(it — j)}/{7r(w — j)}^ + sin{7r(it — j)}/{T^{u — j)}^) ■ Finally we note that 



plus 0(1) terms where 



-25 



-1 [sin WU - s)}- cos {nij - s)} 7r(j - s)] 
s z as. 



Also 



+0(1) = 1 + 0(1) 



cov{z);.^■^)^i5f^)^} = 0(1) 



} = <5i3A,,iv(^,<5)+o(l) = 0(r')+o(l). 



Combining these results we find that as j N, and — > 00, ct^ ^ « 104. Thus for increasing 
j the variance of I^^'^'> (if)*) tends to a constant, again the covariance terms will behave like the 
covariance terms in the score, and the mean is of negligible magnitude. We are thus adding many 
identically distributed variatcs with order one variance, and the same weak dependence as before. We 
can yet again adapt the arguments of Hurvich et al. (1998). The argument will necessarily become 
very complicated, as we now need to consider a quadratic form in twelve Gaussian correlated variates, 
and there is no real point in giving the exact details of the argument. 

The distribution may for non-negligible values of i5 be slow to attain, and so for large but more 
moderate A'' we propose to use: 



l2,jv(V*) = Wnai + 0(1), Wn,ii ~ Af 



\ 3 3 



(A-43) 



For large N we find y/NY.j -Ba,-,jv(^, 5) = o(l) whilst 

1 .9 1 167r4 N 



-'3 



N 15 2 



+ o(l) 



8^ 
15 



0(1), 



and so we may note that Wjv,ii => Z5, ~ A/'(0, Stt^/IS). We furthermore note that as the 
variance increases linearly with | J2 — Ji | the distribution of the second derivative at values of j near 
the pole eventually becomes negligible in influence in the random variable [Wjv (■'/')]ii > a^^d thus the 
distributional results will also hold for [Wn (V')]ii when ip e or 

[WnWU = [WV(t/'*)]n+o(l) = ^5+o(l). (A-44) 
This establishes the distribution of the standardized observed Fisher information of the likelihood. ■ 
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However, before we may eombine these results to note the distribution of we must consider the 
dependence between N~^t^{'ij}*) and N~^/'^i^^^{il}), which, based on the argument of the distributional 
equivalence of {(■»/'), and and the asymptotic Gaussianity of the variables corresponds to 

bounding the covariance of N~^i^{'ip*) and N~^£^^^{tp*). 



Proposition 8 The restandardized score in ^ and the restandardized observed Fisher information in 

^ evaluated at ip* satisfy cov {kN,i{'4'*)7 [Wn{iI'*)]ii} = We can thus deduce that as 

[Wjv(i/')]^j = [Wjv(''/'*)]ii o,nd asymptotic Gaussianity is valid, asymptotic independence follows. 

Proof: Due to previous arguments of large sample distributional equivalence, and due to the asymp- 
totic Gaussianity. we need only consider the covariance of Yi,n{iP*) and Y2.n{iP*), and thus start by 
considering the covariance of the elements that make up these objects. We note that 

c^J = ooy{ir\f/'-^} 

We consider the j = k terms and show that their contribution decays suitably in j: the cross terms 
will be bounded like in previous arguments, relying of the decay for log(A'') < k < j. Then combining 
the results of the previous section with Isserlis's theorem we find that (up to o(l)): 

+ 1 ^^2n^Bx^,N{i,5) - Bx,M^,6) - ^Cx.M^,^)] KM^^^) 

+ ^Bx^M^,m{Kjj) +n^Bx,MU [-IbxM^,S) + 2SC^^^j^i^,^ 

+ i {2n'Bx,,Ni^,S) - ijA,,Jv(^,<5)CA,,Jv(^,<5)} - ^Bx.M^,^) + '^^''Bx„n{^^^)Bx,M^,S)- 

Thus we may deduce Cjj = 0{j~^) + o(l). The cross terms, i.e. Ck,j, may be bounded in a standard 
fashion using the same argument, so that 

cov{fc;v,i(V'*), [WNm]n} = 0(1)- (A-45) 
We can thus deduce the asymptotic independence of variables kN,i{ip*) and [W'jv('0)]ii • ^ 



Proposition 9 The large sample distribution of the MLE of ^ tends to: 

where C ~ Cauchy. 

Proof: To show this result we can simply use Propositions 6, 7 and 8. 

Note on Usage of Asymptotic Form: We have 

Cn = - e) = kN,i{ip*)/ [WNm^^ ■ (A-47) 
We note that from equations (A-29) and (A-38), using proposition 8 that 



^/b \/5 ^5 ^5/^8774/15 
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Define ci = tan(7r(-i + 0.025)) and C2 = tan(7r(-i + 0.975)), then 

Thus a 95% CI is given for ^ by (^- 3.20/7V,^+ 3.20/7V). This estabhshes the large sample theory for 

^. However, the effect on MLE of low j contributions decays slowly, and so we provide an additional 
approximation to the distribution, based on equations (A-28) as well as (A-37). 



Note on Usage of Large Sample Approximation Form For finite A'', as already discussed, it may be 
more appropriate to approximate the distribution of the two random variables using Kj\f M{h-^,cf\) 
and Wn, 11 ~ A/'(/X2) i^i) where 



I^^ = ^J1 i3A,.Ar(C,<5) =0(1) = ^ E {\5^BlA^^^)+BxM^,8)CxM^,5)^+^ 



(1) 



J2 



J2 



/«2 



^ E B,MU)+o{l) = ^ E '^f +0(1) 



where ct^ is given by equation (A-41). With these quantities, we have Ci = Kn/Wn,ii, C2 = W"Ar,ii, 
l^Af.ii = C2 and — C1C2, and find a confidence interval for Ci, Pr(cii < Ci < C12) = 1 — a, 
assuming that asymptotic independence of and Wn,ii is approximately attained, we have by 
transformation techniques 

/ci,C2(ci,C2) = ^— — e H^? ^/ 
zna 1 £72 



C2 



J — < 



/{ 2 ' 
.00 27rcricr2 

/c, (ci) rfc = a/2 / " fc, (ci) dc = 1 - a/2. 

<y —oQ 



2 I (C2->^2) 



|C2| rfC2 



Thus, once /i2) erf and cri have been determined by calculating the integrals we can derive the approx- 
imation to the distribution of the estimator of ^. In fact, with u{ci) — a\ + c\a2, 



/°° 1 1 I "hi I (g2-M2)^ 1 



C2I dC2 



27r 

C7l(T2 



u(ci) 



-3/2 



V^^^^^.e-4+.^^,e-^erfL2 



0-1 



\f2G2\Ju{ci) 



TT 



u(ci) \ ci e 



(A-49) 



as /U2 ^ 0, and the distribution becomes a scaled Cauchy distribution. Using this approximation, we 
may derive CIs for ^ for a given value i5 by determining cn and C12 for that value of 6 from 

P (cii < TV - e) < C12) = P (?- ci2/Af < <e < ?- ch/At) = 1 - a. 



ion^ Memory Parameter dependence of the CPs The S dependence is implicit in the distribution of 
Ci in equation (A-49), as /i2, erf and cr^ depend on 6. Thus a (1 — a) CI is simply given by ^ ± C12/N. 
For a real data set, we do not know the true value of 6, but note that S — d* + Z2I \J N Tss where 
Z2 ~ A/'(0, 1), from equation (A-16), as the same central limit argument will be valid for the score 
evaluated at 5 lying between 5* and 8. We note that cn and C12 are smooth functions of 5,. Making 
the dependence on 5 explicit we find 



Clfe(5*) - Cife(5) 



= AT- 



(A-50) 
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Table 10: The quantities necessary to approximate the distribution of A'^^ using Ci. 



N 


5 


95% interval 




1 


2 


1021 


0.30 


{± 3.17A""^- 


0.7778 


3.3113 


52.9815 


1024 


0.40 




3.2203 


3.4503 


54.9996 


1024 


0.45 


1.43iV"^ 


9.6724 


3.6872 


56.5742 


2048 


0.30 


± 3.18iV"^ 


0.5606 


3.3180 


52.5227 


2048 


0.40 


i ± 3.00iV"^ 


2.3937 


3.3779 


53.6337 


2048 


0.45 


'i± 1.96iV"^ 


7.3754 


3.5085 


54.6138 


4096 


0.30 




0.4021 


3.3051 


52.2645 


4096 


0.40 




1.7646 


3.3377 


52.8721 


4096 


0.45 


■^'±2.41iV-^ 


5.5698 


3.4091 


53.4590 


8192 


0.30 


'i ± 3.20N-^ 


0.2874 


3.2981 


52.1217 


8192 


0.40 


■^■±3.14iV-i 


1.2921 


3.3157 


52.4517 


8192 


0.45 


'i±2AlN-^ 


4.1725 


3.3544 


52.7938 


oo 


6>0 







3.2899 


51.9515 



and so as N~^/^ Wiki^*)] l-^2| ^ we can use equation (A-50) with cn and Ci2 calculated at S = 5. 
For our simulation study, to reduce the numerical burden of the procedure, we have calculated the CIs 
at 6* . This would not be the approach in a real problem, but given the reduced computational cost of 
a single calculation of cn and ci2 for real examples, this is not an issue. ■ 

Finally, we establish that the score in 6 and ^ are uncorrelated, as the off-diagonal terms of the 
standardized observed Fisher information converge to zero. 



Proposition 10 We have that cov {kN,i{tp*), kN,2{'*P*)} = o(l), and thus we can note that the 
distributional results follow. 

Proof: Note that k^.Ni^P*) = N-^/^iip") and ks,N{ip'') = Tj}'"^ N-^'Hsii}*). We thus consider 
iV-2cov{;j(V'*),?5(i/'*)}- We have 



56 



plus 0(1) terms. Thus it follows 



j k 
1 



-0(1) 



3 k<j 



+EE4^^-{xf-),4/'-)} +0(1) 

3 k 

= ^;^i:i:4"cov{ir'.4'''"}+o(i) 

Note that using Isserlis's theorem (Isserlis, 1918) we have: 
For j = k we have 

for increasing j and N. The cross-terms cov may be bounded in the usual fashion. 

var{7f^)} = 0(l) -~-rf(/,^)i 



Yav{iy'^>} = 0(1). 



Combining these results we find that 



lim 

JV— >(X) 



0. 
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